Exam DP-203 All Questions

View all questions & answers for the DP-203 exam

Exam DP-203 topic 1 question 16 discussion

Actual exam question from Microsoft's DP-203

Question #: 16
Topic #: 1

HOTSPOT -
You have two Azure Storage accounts named Storage1 and Storage2. Each account holds one container and has the hierarchical namespace enabled. The system has files that contain data stored in the Apache Parquet format.
You need to copy folders and files from Storage1 to Storage2 by using a Data Factory copy activity. The solution must meet the following requirements:
✑ No transformations must be performed.
✑ The original folder structure must be retained.
✑ Minimize time required to perform the copy activity.
How should you configure the copy activity? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Hot Area:

Show Suggested Answer

Suggested Answer:

Box 1: Parquet -
For Parquet datasets, the type property of the copy activity source must be set to ParquetSource.

Box 2: PreserveHierarchy -
PreserveHierarchy (default): Preserves the file hierarchy in the target folder. The relative path of the source file to the source folder is identical to the relative path of the target file to the target folder.
Incorrect Answers:
✑ FlattenHierarchy: All files from the source folder are in the first level of the target folder. The target files have autogenerated names.
✑ MergeFiles: Merges all files from the source folder to one file. If the file name is specified, the merged file name is the specified name. Otherwise, it's an autogenerated file name.
Reference:
https://docs.microsoft.com/en-us/azure/data-factory/format-parquet https://docs.microsoft.com/en-us/azure/data-factory/connector-azure-data-lake-storage

by EddyRoboto at Aug. 26, 2021, 7:28 p.m.

Comments

Submit Cancel

EddyRoboto

Highly Voted 10 months, 2 weeks ago

This could be binary as source and sink, since there are no transformations on files. I tend to believe that would be binary the correct anwer.

upvoted 85 times

ayn1

4 months, 1 week ago

yes, and a d Binary is fast as it is direct copy, Parquet reads and writs at destination which can be slower.

upvoted 1 times

...

GameLift

3 years, 9 months ago

But the doc says "When using Binary dataset in copy activity, you can only copy from Binary dataset to Binary dataset." So I guess it's parquet then?

upvoted 11 times

conscience

1 year, 9 months ago

I have used Binary to copy entire folders with its subfolder and files which were csv & parquet both. So, IMO binary would be correct answer.

upvoted 1 times

...

captainpike

3 years, 9 months ago

This note is referring to the fact that, in the template, you have to specify “BinarySink” as the type for the target Sink; and that exactly what the Copy data tool does. (you can check this by editing the created copy pipeline and see the code). Choosing BInary and PreserveHierarchy copy all file as they are perfectly.

upvoted 5 times

...

iooj

3 years, 5 months ago

Agree. I've checked it. With binary source and sink datasets it works.

upvoted 5 times

...

jed_elhak

3 years, 9 months ago

no it must be parquet because The type property of the dataset must be set to Binary. and it's parquet hear so answer are correct

upvoted 2 times

...

Load full discussion...

...

AbhiGola

Highly Voted 3 years, 11 months ago

Answer seems correct as data is store is parquet already and requirement is to do no transformation so answer is right

upvoted 64 times

NintyFour

3 years, 2 months ago

As question has mentioned, Minimize time required to perform the copy activity. And binary is faster than Parquet. Hence, Binary is answer

upvoted 6 times

anto69

2 years, 7 months ago

No: req1 "no transformation", req2 "Minimize time required to perform the copy activity". Both must be met hence it's Parquet cause it's the second fastest choice and it requires no transformations.

upvoted 6 times

mhi

2 years, 2 months ago

when doing a binary copy, you're not doing any transformation!

upvoted 4 times

...

JustAnotherDBA

Most Recent 10 months, 1 week ago

The answer is correct. 3 reasons. The file format is Parquet. Parquet has the 2nd fastest load time. No data transformations should happen, If we are going to quote articles, please read the WHOLE article before posting. Check out the formats that the binary can handle. "When using Binary dataset in copy activity, you can only copy from Binary dataset to Binary dataset."

upvoted 12 times

JustAnotherDBA

2 years, 7 months ago

https://learn.microsoft.com/en-us/azure/data-factory/format-binary

upvoted 3 times

...

mtc9

2 years, 1 month ago

Binary to binary copies the files as they are, retaining the same content, hence retaining the format and it;s faster than parquet, because it doesn;t require load at all just copy.

upvoted 2 times

...

Lestrang

10 months, 1 week ago

According to ChatGPT While "binary" dataset type would be the fastest in terms of copying the data from one Azure storage account to another, it would not be the correct option in this scenario because it does not retain the original format of the files. If the files contain data stored in the Apache Parquet format, specifying the source dataset type as "binary" would cause Data Factory to treat the files as generic binary files, and it would copy the data as is, without recognizing the original format of the files. This would result in losing the original format of the files, and possibly losing the structure of the data, it could also make it more difficult to read the data. Also, When you copy files using binary dataset type, Data Factory will not be able to detect the changes in files and it copies the entire data each time, this can be inefficient in terms of time and storage. it really gives shitty azure answers in general, but ill go for parquet for this one.

upvoted 12 times

mtc9

2 years, 1 month ago

ChatGPT is plainly wrong, binary type retais the original parquet format, ebcause it means to copy the files as they are and it;s faster than parquet dataset, because it's doesn't require parsing the files. Binary is correct.

upvoted 1 times

...

klayytech

10 months, 1 week ago

The answer is still Source dataset type: Parquet Copy activity copy behavior: Preserve Hierarchy. Even though Binary can be used as the source dataset type, it is not the best option in this scenario. The original folder structure is important, and using Parquet as the source dataset type will ensure that it is preserved. Source dataset type: Parquet Copy activity copy behavior: Preserve Hierarchy This will ensure that the files are copied in their original format, and that the original folder structure is preserved in the destination container. This is the best option for this scenario, as it meets all of the requirements.

upvoted 1 times

...

auwia

10 months, 1 week ago

Massimo Manganiello <[email protected]> 13:36 (49 minuti fa) a me When it comes to efficiency, copying data from a Parquet file to another Parquet file is generally more efficient than copying to a binary format. This is because Parquet is a columnar storage format specifically designed for efficient data compression and query performance. It leverages advanced compression techniques and data encoding to minimize storage size and optimize query execution. Copying data from a Parquet file to a binary format may require additional steps and conversions. Binary formats, such as plain text or custom binary formats, may not have the same level of built-in compression and optimization as Parquet. Therefore, the copy process may involve additional serialization and deserialization steps, resulting in increased processing overhead and potentially larger storage requirements. In summary, when the source and destination formats are both Parquet, copying between Parquet files is generally more efficient in terms of storage utilization and query performance. In my opinion, the provided answer are corrects!

upvoted 3 times

...

Fusejonny1

1 year, 6 months ago

Source dataset type should be set to binary. The reason for this is that you’re not performing any transformations on the data, you’re simply copying it from one location to another while retaining the original folder structure. The binary dataset in Azure Data Factory is used for copying files as-is without parsing the file data.

upvoted 2 times

...

kkk5566

1 year, 11 months ago

Binary & PerserveHierarchy

upvoted 3 times

...

tonyfig

1 year, 11 months ago

Binary & PerserveHierarchy The Parquet option is used when you want to copy data stored in the Apache Parquet format and perform transformations on the data during the copy activity. However, in this scenario, the requirement is to perform no transformations and minimize the time required to perform the copy activity. The Binary option is better suited for this scenario as it copies the data as-is, without performing any transformations, and minimizes the time required to perform the copy activity.

upvoted 4 times

...

rocky48

2 years ago

Answer seems correct as data is store is parquet already and requirement is to do no transformation so answer is right. Source dataset type: Parquet Copy activity copy behavior: Preserve Hierarchy

upvoted 4 times

...

trantrongw

2 years, 4 months ago

Agree. I've checked it.

upvoted 1 times

...

Rrk07

2 years, 8 months ago

Answer is correct .

upvoted 1 times

...

temacc

2 years, 8 months ago

Binary - copy files as is in fastest way. PreserveHierarchy - for saving folder structure.

upvoted 2 times

...

OldSchool

2 years, 8 months ago

Answer is correct. No transformation and preserve hierarchy

upvoted 1 times

...

RBKasemodel

2 years, 8 months ago

I believe the answer should be Binary, since it is stated that no transformations must be done. "You can use Binary dataset in Copy activity, GetMetadata activity, or Delete activity. When using Binary dataset, the service does not parse file content but treat it as-is." https://learn.microsoft.com/en-us/azure/data-factory/format-binary I couldn't found any information saying that parquet won't be parsed if the source and sink are parquets files. So I -think- it will parse, and we can understand that it is a transformation.

upvoted 2 times

alphilla

1 year, 5 months ago

When using Binary dataset in copy activity, you can only copy from Binary dataset to Binary dataset. Why do you pretend to preserver the hierarchy if the same hierarchy "When using Binary dataset in copy activity, you can only copy from Binary dataset to Binary dataset. "

upvoted 1 times

...

allagowf

2 years, 10 months ago

Answer seems correct, advice don't overthink, the source is parquet and it's one of the options so it is parquet.

upvoted 7 times

...

Deeksha1234

2 years, 11 months ago

given ans is correct

upvoted 2 times

...

Load full discussion...