exam questions

Exam DP-203 All Questions

View all questions & answers for the DP-203 exam

Exam DP-203 topic 2 question 43 discussion

Actual exam question from Microsoft's DP-203
Question #: 43
Topic #: 2
[All DP-203 Questions]

HOTSPOT -
You have an Azure Storage account that generates 200,000 new files daily. The file names have a format of {YYYY}/{MM}/{DD}/{HH}/{CustomerID}.csv.
You need to design an Azure Data Factory solution that will load new data from the storage account to an Azure Data Lake once hourly. The solution must minimize load times and costs.
How should you configure the solution? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Hot Area:

Show Suggested Answer Hide Answer
Suggested Answer:
Box 1: Incremental load -

Box 2: Tumbling window -
Tumbling windows are a series of fixed-sized, non-overlapping and contiguous time intervals. The following diagram illustrates a stream with a series of events and how they are mapped into 10-second tumbling windows.

Reference:
https://docs.microsoft.com/en-us/stream-analytics-query/tumbling-window-azure-stream-analytics

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
onyerleft
Highly Voted 2 years, 10 months ago
1) Incremental Load 2) Tumbling Window Seems like you could go with either Schedule trigger or Tumbling Window here. I would use the latter option, and pass the windowStart system variable to the pipeline as a parameter, allowing me to more easily navigate to the proper directory in the storage account.
upvoted 32 times
mav2000
8 months, 1 week ago
I believe it's only tumbling window, because if you were to choose fixed schedule, then you wouldn't know which files to load, the tumbling window allows you to know what happened in that window of time and know which files to load
upvoted 1 times
MBRSDG
7 months ago
interesting, do you have references to support such a behaviour? From MS documentation, I can't find that DataFactory tracks file creation inside a tumbling window. From docs, It seems like a matter of scheduling instead of what you're saying.
upvoted 1 times
_Ahan_
5 months, 1 week ago
https://learn.microsoft.com/en-us/azure/data-factory/how-to-create-tumbling-window-trigger?tabs=data-factory%2Cazure-powershell
upvoted 2 times
...
...
...
Gikan
9 months ago
Yes, it is true. If the Data Factory contains more than 1 pipeline and I like to trigger it together, the schedule trigger is the only solution: "Supports many-to-many relationships. Multiple triggers can kick off a single pipeline. A single trigger can kick off multiple pipelines." https://learn.microsoft.com/en-us/azure/data-factory/concepts-pipeline-execution-triggers
upvoted 1 times
...
...
xcsakubara
Highly Voted 2 years, 8 months ago
Since, we are loading NEW data and not going back in time, it should be Schedule as we are scheduling it for every 1 hour in the future. It would've been Tumbling if we scheduled it for every 1 hour in the past.
upvoted 17 times
phydev
1 year ago
Besides, a scheduled trigger is a better option for this specific scenario than a tumbling window due to precision, efficiency and cost savings.
upvoted 2 times
...
...
positivitypeople
Most Recent 10 months, 1 week ago
Got this question today on the exam
upvoted 4 times
...
Momoanwar
10 months, 3 weeks ago
Correct, chatgpt : For the scenario described, to load data from an Azure Storage account to an Azure Data Lake hourly and to minimize load times and costs, you would configure the Azure Data Factory solution as follows: - **Load methodology**: Incremental Load - Because you are loading new data every hour, and the goal is to minimize the load times and costs, you would incrementally load only the new data that has arrived since the last load. - **Trigger**: Tumbling window - This trigger is suitable for fixed-duration, repeating intervals in Azure Data Factory, which fits the requirement of loading data hourly. Using a tumbling window trigger ensures that each window of time is processed once and only once, and by doing an incremental load, you are only processing the new data that has appeared since the last hour, rather than reprocessing all existing data.
upvoted 2 times
...
Andrew_Chen
1 year ago
I think one thing very important here is that Tumbling window manages state between runs, that means that it will not be count twice.
upvoted 1 times
...
auwia
1 year, 4 months ago
From Azure Data Factory Studio when you create a new trigger, you can choise TYPE in ('Schedule', 'Tumbling window', 'Storage events', 'Custom events'). We should exclude "Fixed Schedule" becuase of 'fixed'! :) So my final answer will be Incremental Load and Tumbling Window.
upvoted 6 times
...
vedantnj
1 year, 5 months ago
Hi there
upvoted 7 times
...
Rossana
1 year, 6 months ago
To minimize load times and costs for loading new data from the storage account to an Azure Data Lake once hourly, you should configure the solution to use incremental load and a trigger based on new files arriving. Load methodology: With 200,000 new files generated daily, a full load every hour could be time-consuming and expensive. Incremental load is a better option in this scenario because it only loads new or changed data since the last successful execution of the pipeline, which can significantly reduce load times and costs. Trigger: A trigger based on new files arriving is the most efficient option because it only runs the pipeline when new files are detected in the storage account. This avoids unnecessary pipeline executions and reduces costs. A fixed schedule trigger runs the pipeline at fixed intervals, regardless of whether there is new data to process or not. A tumbling window trigger runs the pipeline at specified intervals, but still processes all data within the window, regardless of whether there is new data or not. Therefore, a new file trigger is the best option in this scenario.
upvoted 5 times
mav2000
8 months, 1 week ago
Wrong, the questions specifies that it has to run hourly, so it's tumbling window
upvoted 2 times
...
...
martcerv
1 year, 10 months ago
A schedule for an activity creates a series of tumbling windows with in the pipeline start and end times I think is "Fixed schedule" because "Tumbling windows" are more related to streams analytics questions according to MS doc. https://learn.microsoft.com/en-us/azure/data-factory/v1/data-factory-scheduling-and-execution
upvoted 6 times
...
Deeksha1234
2 years, 2 months ago
1) Incremental Load 2) Tumbling Window
upvoted 4 times
...
jskibick
2 years, 5 months ago
With Scheduled trigger executions can overlaps if the process does not finish within 1 hour, Tumbling window is better, with concurrency setting it can allow only one ongoing execution.
upvoted 14 times
...
Massy
2 years, 7 months ago
both Tumbling Window and Schedule trigger will reach the goal. Which one is more cost effective?
upvoted 1 times
Boompiee
2 years, 5 months ago
I think because every hour you're only processing the past hour's data. With a tumbling window you can define which messages to process, whereas with a schedule trigger you'd have to implement that filter separately.
upvoted 2 times
...
...
xcsakubara
2 years, 8 months ago
why not schedule trigger?
upvoted 1 times
sparkchu
2 years, 7 months ago
for backfill purpose? just guessing.
upvoted 1 times
...
...
jv2120
2 years, 10 months ago
incremental, fixed schedule every hour.
upvoted 6 times
jv2120
2 years, 10 months ago
correct answer..tumbling window
upvoted 1 times
...
...
Ayan3B
2 years, 10 months ago
As a input we are receiving csv files so why not trigger mechanism to the pipeline when file arrived.
upvoted 2 times
ItHYMeRIsh
2 years, 10 months ago
The question says, "load new data from the storage account to the Azure Data Lake once hourly." This already indicates a tumbling window to run every hour. On top of that, if you executed this as an event every time a file arrived, you'd have 200,000 ADF pipeline executions per day - one per file. If you ran the pipeline once per hour per day, you'd have just 24. 1,000 ADF runs is $1. In this situation, 1 day is 24 runs when executed on a tumbling window. That's 2.4 cents. If we ran 200,000 pipelines, that'd be $200/day. This excludes other costs. https://azure.microsoft.com/en-us/pricing/details/data-factory/data-pipeline/
upvoted 36 times
ANath
2 years, 9 months ago
That's correct. Well explained
upvoted 2 times
...
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago