exam questions

Exam DP-201 All Questions

View all questions & answers for the DP-201 exam

Exam DP-201 topic 1 question 1 discussion

Actual exam question from Microsoft's DP-201
Question #: 1
Topic #: 1
[All DP-201 Questions]

Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You are designing an HDInsight/Hadoop cluster solution that uses Azure Data Lake Gen1 Storage.
The solution requires POSIX permissions and enables diagnostics logging for auditing.
You need to recommend solutions that optimize storage.
Proposed Solution: Ensure that files stored are larger than 250MB.
Does the solution meet the goal?

  • A. Yes
  • B. No
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️
Depending on what services and workloads are using the data, a good size to consider for files is 256 MB or greater. If the file sizes cannot be batched when landing in Data Lake Storage Gen1, you can have a separate compaction job that combines these files into larger ones.
Note: POSIX permissions and auditing in Data Lake Storage Gen1 comes with an overhead that becomes apparent when working with numerous small files. As a best practice, you must batch your data into larger files versus writing thousands or millions of small files to Data Lake Storage Gen1. Avoiding small file sizes can have multiple benefits, such as:
✑ Lowering the authentication checks across multiple files
✑ Reduced open file connections
✑ Faster copying/replication
✑ Fewer files to process when updating Data Lake Storage Gen1 POSIX permissions
Reference:
https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-best-practices

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Piiri565
Highly Voted 4 years, 6 months ago
POSIX permissions and auditing in Data Lake Storage Gen1 comes with an overhead that becomes apparent when working with numerous small files. As a best practice, you must batch your data into larger files versus writing thousands or millions of small files to Data Lake Storage Gen1. according to this docs resource, I think the given answer is correct
upvoted 17 times
...
arpit_dataguy
Highly Voted 3 years, 11 months ago
We can ignore questions where we see GEN1 as it is out of scope now.
upvoted 5 times
...
Ambujinee
Most Recent 3 years, 11 months ago
File size is accepted within 256MB to 2GB
upvoted 1 times
...
cadio30
4 years ago
Referencing the provided link the minimum acceptable file size is 256MB whereas the propose solution started at 250MB. I would say the answer is 'NO' Reference: https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-best-practices
upvoted 2 times
ZodiaC
3 years, 12 months ago
That makes not really sense for the this question
upvoted 1 times
...
baobabko
3 years, 11 months ago
250 MB vs 256 MB gives less than 3% waste in worst-case. So it is acceptable. Answer should be YES
upvoted 1 times
...
...
SplMonk
4 years, 1 month ago
So is this a trap question? as the guidance is 256MB and they are saying larger than 250MB... a small difference but below we recommended size
upvoted 1 times
...
Deepu1987
4 years, 3 months ago
The given solution is correct Typically, analytics engines such as HDInsight and Azure Data Lake Analytics have a per-file overhead. If you store your data as many small files, this can negatively affect performance. pls refer this link https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-performance-tuning-guidance#structure-your-data-set In general, organize your data into larger sized files for better performance. As a rule of thumb, organize data sets in files of 256 MB or larger
upvoted 2 times
...
chaoxes
4 years, 5 months ago
Given answer B. No is correct. In POSIX-style model it is recommended to avoid small size files, due to following considerations: -Lowering the authentication checks across multiple files -Reduced open file connections -Faster copying/replication -Fewer files to process when updating Data Lake Storage Gen1 POSIX permissions
upvoted 1 times
...
SudhakarMani
4 years, 6 months ago
Is it correct answer?
upvoted 1 times
...
syu31svc
4 years, 6 months ago
Provided link says at least 265 MB but greater than 250 MB seems good enough. I would agree with the answer
upvoted 2 times
...
Torent2005
4 years, 6 months ago
Not really, it's a trap. Files should be grater than 256 mb regarding to best practises. So bigger file thant 250 like 251 it's not a solution.
upvoted 1 times
...
BaisArun
4 years, 6 months ago
Agree with @Piiri565
upvoted 2 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...