exam questions

Exam DP-201 All Questions

View all questions & answers for the DP-201 exam

Exam DP-201 topic 2 question 37 discussion

Actual exam question from Microsoft's DP-201
Question #: 37
Topic #: 2
[All DP-201 Questions]

You are designing a solution that will copy Parquet files stored in an Azure Blob storage account to an Azure Data Lake Storage Gen2 account.
The data will be loaded daily to the data lake and will use a folder structure of {Year}/{Month}/{Day}/.
You need to design a daily Azure Data Factory data load to minimize the data transfer between the two accounts.
Which two configurations should you include in the design? Each correct answer presents part of the solution.
NOTE: Each correct selection is worth one point.

  • A. Delete the files in the destination before loading new data.
  • B. Filter by the last modified date of the source files.
  • C. Delete the source files after they are copied.
  • D. Specify a file naming pattern for the destination.
Show Suggested Answer Hide Answer
Suggested Answer: BC 🗳️
B: To copy a subset of files under a folder, specify folderPath with a folder part and fileName with a wildcard filter.
C: After completion: Choose to do nothing with the source file after the data flow runs, delete the source file, or move the source file. The paths for the move are relative.
Reference:
https://docs.microsoft.com/en-us/azure/data-factory/connector-azure-data-lake-storage

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
phi618t
Highly Voted 3 years, 12 months ago
If you choose C. Delete the source files after they are copied, why do you choose B. Filter by the last modified date of the source files? I prefer BD.
upvoted 12 times
Marcus1612
3 years, 8 months ago
This is a basic question. Copy data from one place to another. The requirements are : 1- need to minimize transfert and 2- need to adapte data to the destination folder structure. Filter on LastModifiedDate will copy everything that have changed since the latest load while minimizing the data transfert. Specifying the file naming pattern allows to copy data at the right place to the destination Data Lake. The answer is BD
upvoted 2 times
...
Bhagya123456
3 years, 9 months ago
How naming pattern gonna minimize the Data Transfer? BC should be correct answer.
upvoted 1 times
...
...
Wendy_DK
Highly Voted 4 years ago
Correct answer is BC. In the source option of copy activities. There are three choices: 1. No Action 2. Delete Source files 3. Move
upvoted 10 times
...
BigMF
Most Recent 3 years, 11 months ago
A is obviously out and you're are not going to do both B and C so D is in by default. Your only choice at that point is B or C to go along with D. In my experience, you cannot rely 100% on any job to run every single day (assuming this process is daily). Therefore, if the job does not run for one or more days, if you were to choose B you would only copy over the most recent files and there would be files left in the storage account. Therefore, my choice would be to not filter and load everything that is in the storage account and then delete the files once they have been copied. So, C and D are my choices.
upvoted 4 times
YLiu
3 years, 8 months ago
B ensures minimized data transfer. If it copies everything every time, then data transfer is not minimized.
upvoted 1 times
...
...
mter2007
4 years, 1 month ago
I would like to choose CD.
upvoted 3 times
...
maciejt
4 years, 1 month ago
The was no requirement what to do with original files, so why i the world anwer C - delete them???
upvoted 3 times
BobFar
4 years ago
I guess to make sure you dont read the file again!
upvoted 1 times
...
...
Nik71
4 years, 2 months ago
C seems not correcct as to deletion you can do life cycle mgmt in storage, so D should be second answer.
upvoted 2 times
...
AlexD332
4 years, 2 months ago
thought it's the only logical choice but they said copy activity not moving files
upvoted 1 times
H_S
4 years, 2 months ago
I think it"s BD
upvoted 22 times
etl
4 years, 2 months ago
Wildcard path: Using a wildcard pattern will instruct ADF to loop through each matching folder and file in a single Source transformation. This is an effective way to process multiple files within a single flow. Add multiple wildcard matching patterns with the + sign that appears when hovering over your existing wildcard pattern. From your source container, choose a series of files that match a pattern. Only container can be specified in the dataset. Your wildcard path must therefore also include your folder path from the root folder.
upvoted 1 times
etl
4 years, 2 months ago
yes BD.. i think you are right
upvoted 4 times
...
maciejt
4 years, 1 month ago
but this applies to finding a source files and D was about destintion file naming pattern... which there were no requirement to change the file name
upvoted 2 times
...
...
cadio30
4 years ago
Agree with the answer B and D as this kind of setup doesn't perform any deletion from both storages which lessen the processing.
upvoted 3 times
...
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...