Exam DP-200 All Questions

View all questions & answers for the DP-200 exam

Exam DP-200 topic 5 question 8 discussion

Actual exam question from Microsoft's DP-200

Question #: 8
Topic #: 5

A company has a real-time data analysis solution that is hosted on Microsoft Azure. The solution uses Azure Event Hub to ingest data and an Azure Stream
Analytics cloud job to analyze the data. The cloud job is configured to use 120 Streaming Units (SU).
You need to optimize performance for the Azure Stream Analytics job.
Which two actions should you perform? Each correct answer presents part of the solution.
NOTE: Each correct selection is worth one point.

A. Implement event ordering
B. Scale the SU count for the job up
C. Implement Azure Stream Analytics user-defined functions (UDF)
D. Scale the SU count for the job down
E. Implement query parallelization by partitioning the data output
F. Implement query parallelization by partitioning the data input

Show Suggested Answer

Suggested Answer: BF 🗳️
Scale out the query by allowing the system to process each input partition separately.
F: A Stream Analytics job definition includes inputs, a query, and output. Inputs are where the job reads the data stream from.
References:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-parallelization

by ADHDBA at Jan. 9, 2020, 7:49 p.m.

Comments

Submit Cancel

Cassielovedata

Highly Voted 4 years, 9 months ago

https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-parallelization The link above from Azure has the following statement: This article shows you how to take advantage of parallelization in Azure Stream Analytics. You learn how to scale Stream Analytics jobs by configuring input partitions and tuning the analytics query definition. I think it implies that once you partition the input, the output will be partitioned according to your query. There is no way that you can directly partition the output. Besides, the table of contents on the left shows that SU is one way to optimize the Stream Analysis. Thus, the answer should be B and F

upvoted 17 times

...

epgd

Highly Voted 5 years, 5 months ago

I think the correct answer are E. Implement query parallelization by partitioning the data output F. Implement query parallelization by partitioning the data input Because increasing number of streaming units for a job might not reduce SU% Utilization if your query is not fully parallel. And I think 120 RU should be enought if we consider that 6 RU is the full capacity of a single computing node. Reference: https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-streaming-unit-consumption

upvoted 10 times

epgd

5 years, 4 months ago

But the question only about "Azure Event Hub to ingest data and an Azure Stream Analytics cloud job to analyze the data". So B. Scale the SU count for the job up F. Implement query parallelization by partitioning the data input

upvoted 31 times

dumpsm42

4 years, 6 months ago

hmmm i agree with your first assumption, 120 RU seems for me already a large number, i think microsoft wants us to really choose E and F; even if from azure event hub the data comes with partitions, as stated in the links we must explicity set the partition by for the analytics job so.... hm.... E and F. Sorry.

upvoted 1 times

dumpsm42

4 years, 6 months ago

Partitions in inputs and outputs Partitioning lets you divide data into subsets based on a partition key. If your input (for example Event Hubs) is partitioned by a key, it is highly recommended to specify this partition key when adding input to your Stream Analytics job. Scaling a Stream Analytics job takes advantage of partitions in the input and output. A Stream Analytics job can consume and write different partitions in parallel, which increases throughput.

upvoted 1 times

...

Qrm_1972

Most Recent 4 years, 1 month ago

I have seen this question and the answer in other sites , the correct answer 100% is : BF

upvoted 2 times

drosen

8 months, 2 weeks ago

Increasing the number of streaming units (SU) is not exactly an optimization, but rather an improvement by adding resources. Focusing on optimizing with current resources is more accurate. In that case, the best options would be: Implement query parallelization by partitioning the data input: Distribute the initial workload, improving performance without the need to add additional resources. Implement query parallelization by partitioning the data output: Ensure that data processing and delivery are also distributed efficiently, optimizing the entire workflow.

upvoted 1 times

...

akram786

4 years, 3 months ago

E and F as answer for optimization

upvoted 2 times

...

syu31svc

4 years, 7 months ago

Streaming Units (SUs) represents the computing resources that are allocated to execute a Stream Analytics job. The higher the number of SUs, the more CPU and memory resources are allocated for your job. Scaling up does increase performance but is not a good way to optimize jobs. I would say E and F as the answer

upvoted 2 times

syu31svc

4 years, 7 months ago

Changing to B and E since input is already partitioned

upvoted 1 times

...

Andrexx

4 years, 8 months ago

In my opinion the answer is correct. The cloud job is already configured to use 120 Streaming Units (SU), so if we partition the input, we must scale the SU up to support the extra SU load that will be generated by this partitioning. Makes sense?

upvoted 8 times

...

ADHDBA

5 years, 5 months ago

How is adding extra steaming units optimizing performance ? Is adding RAM on a SQL Server optimizing performance ?

upvoted 2 times

damew26089

5 years, 1 month ago

If it improves the performance of the jobs of the SQL Server, sure?

upvoted 4 times

diulin

5 years ago

I agree with ADHDBA, the question is about optimizing and not improving performance. E and F. https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-parallelization says: " If your input (for example Event Hubs) is partitioned by a key, it is highly recommended to specify this partition key when adding input to your Stream Analytics job. Scaling a Stream Analytics job takes advantage of partitions in the input and output. A Stream Analytics job can consume and write different partitions in parallel, which increases throughput."

upvoted 2 times

...