Exam DP-203 All Questions

View all questions & answers for the DP-203 exam

Exam DP-203 topic 2 question 43 discussion

Actual exam question from Microsoft's DP-203

Question #: 43
Topic #: 2

HOTSPOT -
You have an Azure Storage account that generates 200,000 new files daily. The file names have a format of {YYYY}/{MM}/{DD}/{HH}/{CustomerID}.csv.
You need to design an Azure Data Factory solution that will load new data from the storage account to an Azure Data Lake once hourly. The solution must minimize load times and costs.
How should you configure the solution? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Hot Area:

Show Suggested Answer

Suggested Answer:

Box 1: Incremental load -

Box 2: Tumbling window -
Tumbling windows are a series of fixed-sized, non-overlapping and contiguous time intervals. The following diagram illustrates a stream with a series of events and how they are mapped into 10-second tumbling windows.

Reference:
https://docs.microsoft.com/en-us/stream-analytics-query/tumbling-window-azure-stream-analytics

by Ayan3B at Dec. 12, 2021, 8:12 a.m.

Comments

Submit Cancel

onyerleft

Highly Voted 2 years, 10 months ago

1) Incremental Load 2) Tumbling Window Seems like you could go with either Schedule trigger or Tumbling Window here. I would use the latter option, and pass the windowStart system variable to the pipeline as a parameter, allowing me to more easily navigate to the proper directory in the storage account.

upvoted 32 times

mav2000

8 months, 1 week ago

I believe it's only tumbling window, because if you were to choose fixed schedule, then you wouldn't know which files to load, the tumbling window allows you to know what happened in that window of time and know which files to load

upvoted 1 times

MBRSDG

7 months ago

interesting, do you have references to support such a behaviour? From MS documentation, I can't find that DataFactory tracks file creation inside a tumbling window. From docs, It seems like a matter of scheduling instead of what you're saying.

upvoted 1 times

_Ahan_

5 months, 1 week ago

https://learn.microsoft.com/en-us/azure/data-factory/how-to-create-tumbling-window-trigger?tabs=data-factory%2Cazure-powershell

upvoted 2 times

...

Gikan

9 months ago

Yes, it is true. If the Data Factory contains more than 1 pipeline and I like to trigger it together, the schedule trigger is the only solution: "Supports many-to-many relationships. Multiple triggers can kick off a single pipeline. A single trigger can kick off multiple pipelines." https://learn.microsoft.com/en-us/azure/data-factory/concepts-pipeline-execution-triggers

upvoted 1 times

...

xcsakubara

Highly Voted 2 years, 8 months ago

Since, we are loading NEW data and not going back in time, it should be Schedule as we are scheduling it for every 1 hour in the future. It would've been Tumbling if we scheduled it for every 1 hour in the past.

upvoted 17 times

phydev

1 year ago

Besides, a scheduled trigger is a better option for this specific scenario than a tumbling window due to precision, efficiency and cost savings.

upvoted 2 times

...

positivitypeople

Most Recent 10 months, 1 week ago

Got this question today on the exam

upvoted 4 times

...

Momoanwar

10 months, 3 weeks ago

Correct, chatgpt : For the scenario described, to load data from an Azure Storage account to an Azure Data Lake hourly and to minimize load times and costs, you would configure the Azure Data Factory solution as follows: - **Load methodology**: Incremental Load - Because you are loading new data every hour, and the goal is to minimize the load times and costs, you would incrementally load only the new data that has arrived since the last load. - **Trigger**: Tumbling window - This trigger is suitable for fixed-duration, repeating intervals in Azure Data Factory, which fits the requirement of loading data hourly. Using a tumbling window trigger ensures that each window of time is processed once and only once, and by doing an incremental load, you are only processing the new data that has appeared since the last hour, rather than reprocessing all existing data.

upvoted 2 times

...

Andrew_Chen

1 year ago

I think one thing very important here is that Tumbling window manages state between runs, that means that it will not be count twice.

upvoted 1 times

...

auwia

1 year, 4 months ago

From Azure Data Factory Studio when you create a new trigger, you can choise TYPE in ('Schedule', 'Tumbling window', 'Storage events', 'Custom events'). We should exclude "Fixed Schedule" becuase of 'fixed'! :) So my final answer will be Incremental Load and Tumbling Window.

upvoted 6 times

...

vedantnj

1 year, 5 months ago

Hi there

upvoted 7 times

...

Rossana

1 year, 6 months ago

To minimize load times and costs for loading new data from the storage account to an Azure Data Lake once hourly, you should configure the solution to use incremental load and a trigger based on new files arriving. Load methodology: With 200,000 new files generated daily, a full load every hour could be time-consuming and expensive. Incremental load is a better option in this scenario because it only loads new or changed data since the last successful execution of the pipeline, which can significantly reduce load times and costs. Trigger: A trigger based on new files arriving is the most efficient option because it only runs the pipeline when new files are detected in the storage account. This avoids unnecessary pipeline executions and reduces costs. A fixed schedule trigger runs the pipeline at fixed intervals, regardless of whether there is new data to process or not. A tumbling window trigger runs the pipeline at specified intervals, but still processes all data within the window, regardless of whether there is new data or not. Therefore, a new file trigger is the best option in this scenario.

upvoted 5 times

mav2000

8 months, 1 week ago

Wrong, the questions specifies that it has to run hourly, so it's tumbling window

upvoted 2 times

...

martcerv

1 year, 10 months ago

A schedule for an activity creates a series of tumbling windows with in the pipeline start and end times I think is "Fixed schedule" because "Tumbling windows" are more related to streams analytics questions according to MS doc. https://learn.microsoft.com/en-us/azure/data-factory/v1/data-factory-scheduling-and-execution

upvoted 6 times

...

Deeksha1234

2 years, 2 months ago

1) Incremental Load 2) Tumbling Window

upvoted 4 times

...

jskibick

2 years, 5 months ago

With Scheduled trigger executions can overlaps if the process does not finish within 1 hour, Tumbling window is better, with concurrency setting it can allow only one ongoing execution.

upvoted 14 times

...

Massy

2 years, 7 months ago

both Tumbling Window and Schedule trigger will reach the goal. Which one is more cost effective?

upvoted 1 times

Boompiee

2 years, 5 months ago

I think because every hour you're only processing the past hour's data. With a tumbling window you can define which messages to process, whereas with a schedule trigger you'd have to implement that filter separately.

upvoted 2 times

...

xcsakubara

2 years, 8 months ago

why not schedule trigger?

upvoted 1 times

sparkchu

2 years, 7 months ago

for backfill purpose? just guessing.

upvoted 1 times

...

jv2120

2 years, 10 months ago

incremental, fixed schedule every hour.

upvoted 6 times

jv2120

2 years, 10 months ago

correct answer..tumbling window

upvoted 1 times

...

Ayan3B

2 years, 10 months ago

As a input we are receiving csv files so why not trigger mechanism to the pipeline when file arrived.

upvoted 2 times

ItHYMeRIsh

2 years, 10 months ago

The question says, "load new data from the storage account to the Azure Data Lake once hourly." This already indicates a tumbling window to run every hour. On top of that, if you executed this as an event every time a file arrived, you'd have 200,000 ADF pipeline executions per day - one per file. If you ran the pipeline once per hour per day, you'd have just 24. 1,000 ADF runs is $1. In this situation, 1 day is 24 runs when executed on a tumbling window. That's 2.4 cents. If we ran 200,000 pipelines, that'd be $200/day. This excludes other costs. https://azure.microsoft.com/en-us/pricing/details/data-factory/data-pipeline/

upvoted 36 times

ANath

2 years, 9 months ago

That's correct. Well explained

upvoted 2 times

...

Exam DP-203 All Questions

View all questions & answers for the DP-203 exam

Exam DP-203 topic 2 question 43 discussion

Comments

onyerleft

mav2000

MBRSDG

_Ahan_

Gikan

xcsakubara

phydev

positivitypeople

Momoanwar

Andrew_Chen

auwia

vedantnj

Rossana

mav2000

martcerv

Deeksha1234

jskibick

Massy

Boompiee

xcsakubara

sparkchu

jv2120

jv2120

Ayan3B

ItHYMeRIsh

ANath

SY0-701