exam questions

Exam DP-203 All Questions

View all questions & answers for the DP-203 exam

Exam DP-203 topic 1 question 16 discussion

Actual exam question from Microsoft's DP-203
Question #: 16
Topic #: 1
[All DP-203 Questions]

HOTSPOT -
You have two Azure Storage accounts named Storage1 and Storage2. Each account holds one container and has the hierarchical namespace enabled. The system has files that contain data stored in the Apache Parquet format.
You need to copy folders and files from Storage1 to Storage2 by using a Data Factory copy activity. The solution must meet the following requirements:
✑ No transformations must be performed.
✑ The original folder structure must be retained.
✑ Minimize time required to perform the copy activity.
How should you configure the copy activity? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Hot Area:

Show Suggested Answer Hide Answer
Suggested Answer:
Box 1: Parquet -
For Parquet datasets, the type property of the copy activity source must be set to ParquetSource.

Box 2: PreserveHierarchy -
PreserveHierarchy (default): Preserves the file hierarchy in the target folder. The relative path of the source file to the source folder is identical to the relative path of the target file to the target folder.
Incorrect Answers:
✑ FlattenHierarchy: All files from the source folder are in the first level of the target folder. The target files have autogenerated names.
✑ MergeFiles: Merges all files from the source folder to one file. If the file name is specified, the merged file name is the specified name. Otherwise, it's an autogenerated file name.
Reference:
https://docs.microsoft.com/en-us/azure/data-factory/format-parquet https://docs.microsoft.com/en-us/azure/data-factory/connector-azure-data-lake-storage

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
EddyRoboto
Highly Voted 8 months, 2 weeks ago
This could be binary as source and sink, since there are no transformations on files. I tend to believe that would be binary the correct anwer.
upvoted 85 times
ayn1
2 months, 2 weeks ago
yes, and a d Binary is fast as it is direct copy, Parquet reads and writs at destination which can be slower.
upvoted 1 times
...
GameLift
3 years, 7 months ago
But the doc says "When using Binary dataset in copy activity, you can only copy from Binary dataset to Binary dataset." So I guess it's parquet then?
upvoted 11 times
conscience
1 year, 7 months ago
I have used Binary to copy entire folders with its subfolder and files which were csv & parquet both. So, IMO binary would be correct answer.
upvoted 1 times
...
captainpike
3 years, 7 months ago
This note is referring to the fact that, in the template, you have to specify “BinarySink” as the type for the target Sink; and that exactly what the Copy data tool does. (you can check this by editing the created copy pipeline and see the code). Choosing BInary and PreserveHierarchy copy all file as they are perfectly.
upvoted 5 times
...
...
iooj
3 years, 3 months ago
Agree. I've checked it. With binary source and sink datasets it works.
upvoted 5 times
...
jed_elhak
3 years, 7 months ago
no it must be parquet because The type property of the dataset must be set to Binary. and it's parquet hear so answer are correct
upvoted 2 times
...
...
AbhiGola
Highly Voted 3 years, 9 months ago
Answer seems correct as data is store is parquet already and requirement is to do no transformation so answer is right
upvoted 64 times
NintyFour
3 years ago
As question has mentioned, Minimize time required to perform the copy activity. And binary is faster than Parquet. Hence, Binary is answer
upvoted 6 times
anto69
2 years, 5 months ago
No: req1 "no transformation", req2 "Minimize time required to perform the copy activity". Both must be met hence it's Parquet cause it's the second fastest choice and it requires no transformations.
upvoted 6 times
mhi
2 years ago
when doing a binary copy, you're not doing any transformation!
upvoted 4 times
...
...
...
...
JustAnotherDBA
Most Recent 8 months, 2 weeks ago
The answer is correct. 3 reasons. The file format is Parquet. Parquet has the 2nd fastest load time. No data transformations should happen, If we are going to quote articles, please read the WHOLE article before posting. Check out the formats that the binary can handle. "When using Binary dataset in copy activity, you can only copy from Binary dataset to Binary dataset."
upvoted 12 times
JustAnotherDBA
2 years, 5 months ago
https://learn.microsoft.com/en-us/azure/data-factory/format-binary
upvoted 3 times
...
mtc9
1 year, 11 months ago
Binary to binary copies the files as they are, retaining the same content, hence retaining the format and it;s faster than parquet, because it doesn;t require load at all just copy.
upvoted 2 times
...
...
Lestrang
8 months, 2 weeks ago
According to ChatGPT While "binary" dataset type would be the fastest in terms of copying the data from one Azure storage account to another, it would not be the correct option in this scenario because it does not retain the original format of the files. If the files contain data stored in the Apache Parquet format, specifying the source dataset type as "binary" would cause Data Factory to treat the files as generic binary files, and it would copy the data as is, without recognizing the original format of the files. This would result in losing the original format of the files, and possibly losing the structure of the data, it could also make it more difficult to read the data. Also, When you copy files using binary dataset type, Data Factory will not be able to detect the changes in files and it copies the entire data each time, this can be inefficient in terms of time and storage. it really gives shitty azure answers in general, but ill go for parquet for this one.
upvoted 12 times
mtc9
1 year, 11 months ago
ChatGPT is plainly wrong, binary type retais the original parquet format, ebcause it means to copy the files as they are and it;s faster than parquet dataset, because it's doesn't require parsing the files. Binary is correct.
upvoted 1 times
...
...
klayytech
8 months, 2 weeks ago
The answer is still Source dataset type: Parquet Copy activity copy behavior: Preserve Hierarchy. Even though Binary can be used as the source dataset type, it is not the best option in this scenario. The original folder structure is important, and using Parquet as the source dataset type will ensure that it is preserved. Source dataset type: Parquet Copy activity copy behavior: Preserve Hierarchy This will ensure that the files are copied in their original format, and that the original folder structure is preserved in the destination container. This is the best option for this scenario, as it meets all of the requirements.
upvoted 1 times
...
auwia
8 months, 2 weeks ago
Massimo Manganiello <[email protected]> 13:36 (49 minuti fa) a me When it comes to efficiency, copying data from a Parquet file to another Parquet file is generally more efficient than copying to a binary format. This is because Parquet is a columnar storage format specifically designed for efficient data compression and query performance. It leverages advanced compression techniques and data encoding to minimize storage size and optimize query execution. Copying data from a Parquet file to a binary format may require additional steps and conversions. Binary formats, such as plain text or custom binary formats, may not have the same level of built-in compression and optimization as Parquet. Therefore, the copy process may involve additional serialization and deserialization steps, resulting in increased processing overhead and potentially larger storage requirements. In summary, when the source and destination formats are both Parquet, copying between Parquet files is generally more efficient in terms of storage utilization and query performance. In my opinion, the provided answer are corrects!
upvoted 3 times
...
Fusejonny1
1 year, 4 months ago
Source dataset type should be set to binary. The reason for this is that you’re not performing any transformations on the data, you’re simply copying it from one location to another while retaining the original folder structure. The binary dataset in Azure Data Factory is used for copying files as-is without parsing the file data.
upvoted 2 times
...
kkk5566
1 year, 9 months ago
Binary & PerserveHierarchy
upvoted 3 times
...
tonyfig
1 year, 9 months ago
Binary & PerserveHierarchy The Parquet option is used when you want to copy data stored in the Apache Parquet format and perform transformations on the data during the copy activity. However, in this scenario, the requirement is to perform no transformations and minimize the time required to perform the copy activity. The Binary option is better suited for this scenario as it copies the data as-is, without performing any transformations, and minimizes the time required to perform the copy activity.
upvoted 4 times
...
rocky48
1 year, 11 months ago
Answer seems correct as data is store is parquet already and requirement is to do no transformation so answer is right. Source dataset type: Parquet Copy activity copy behavior: Preserve Hierarchy
upvoted 4 times
...
trantrongw
2 years, 2 months ago
Agree. I've checked it.
upvoted 1 times
...
Rrk07
2 years, 6 months ago
Answer is correct .
upvoted 1 times
...
temacc
2 years, 6 months ago
Binary - copy files as is in fastest way. PreserveHierarchy - for saving folder structure.
upvoted 2 times
...
OldSchool
2 years, 6 months ago
Answer is correct. No transformation and preserve hierarchy
upvoted 1 times
...
RBKasemodel
2 years, 6 months ago
I believe the answer should be Binary, since it is stated that no transformations must be done. "You can use Binary dataset in Copy activity, GetMetadata activity, or Delete activity. When using Binary dataset, the service does not parse file content but treat it as-is." https://learn.microsoft.com/en-us/azure/data-factory/format-binary I couldn't found any information saying that parquet won't be parsed if the source and sink are parquets files. So I -think- it will parse, and we can understand that it is a transformation.
upvoted 2 times
alphilla
1 year, 4 months ago
When using Binary dataset in copy activity, you can only copy from Binary dataset to Binary dataset. Why do you pretend to preserver the hierarchy if the same hierarchy "When using Binary dataset in copy activity, you can only copy from Binary dataset to Binary dataset. "
upvoted 1 times
...
...
allagowf
2 years, 8 months ago
Answer seems correct, advice don't overthink, the source is parquet and it's one of the options so it is parquet.
upvoted 7 times
...
Deeksha1234
2 years, 9 months ago
given ans is correct
upvoted 2 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...