exam questions

Exam DP-203 All Questions

View all questions & answers for the DP-203 exam

Exam DP-203 topic 2 question 100 discussion

Actual exam question from Microsoft's DP-203
Question #: 100
Topic #: 2
[All DP-203 Questions]

HOTSPOT
-

You have an Azure Blob storage account that contains a folder. The folder contains 120,000 files. Each file contains 62 columns.

Each day, 1,500 new files are added to the folder.

You plan to incrementally load five data columns from each new file into an Azure Synapse Analytics workspace.

You need to minimize how long it takes to perform the incremental loads.

What should you use to store the files and in which format? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Show Suggested Answer Hide Answer
Suggested Answer:

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
ababatunde_hs
Highly Voted 2 years, 1 month ago
Time partitioning is correct as the fastest way to load only new files, but requires that the timeslice information be part of the file or folder name (https://learn.microsoft.com/en-us/azure/data-factory/tutorial-incremental-copy-overview) However, Parquet is the correct file format since it's a columnar format
upvoted 53 times
...
kkk5566
Highly Voted 1 year, 9 months ago
Time partitioning and parquet
upvoted 9 times
...
suranga4
Most Recent 7 months ago
Answer should be Timeslicing and Parquet
upvoted 1 times
...
MBRSDG
1 year, 2 months ago
Parquet is the answer to the second question. You nee to take only 5 colums out of 62: with CSV you'd have to explore the file row-wise sequentially... Parquet is more efficient, since seleciton proceeds column-wise.
upvoted 1 times
MBRSDG
1 year, 2 months ago
Just a notice. This answer should be proven by a benchmark. Currently I didn't find any benchmark comparing the performances of the two files. Logically speaking, I'd expect Parquet to be a lot more efficient, but it should have to be measured in practise.
upvoted 2 times
...
...
vctrhugo
1 year, 11 months ago
You need to minimize how long it takes to perform the incremental loads. With Parquet, which is a columnar format, it is way faster to select a few columns than csv.
upvoted 2 times
...
vegeta379
2 years ago
we can do incremental load just with deltatable for a parquet file which supported by datarbricks or synapse spark and here he didn't give details so I think it will be CSV
upvoted 1 times
...
pavankr
2 years ago
I think the requirement is to select specific columns, hence CSV?
upvoted 1 times
...
verisdev
2 years ago
it supposed to be Parquet instead of CSV
upvoted 5 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...