exam questions

Exam DP-201 All Questions

View all questions & answers for the DP-201 exam

Exam DP-201 topic 1 question 3 discussion

Actual exam question from Microsoft's DP-201
Question #: 3
Topic #: 1
[All DP-201 Questions]

Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You are designing an HDInsight/Hadoop cluster solution that uses Azure Data Lake Gen1 Storage.
The solution requires POSIX permissions and enables diagnostics logging for auditing.
You need to recommend solutions that optimize storage.
Proposed Solution: Ensure that files stored are smaller than 250MB.
Does the solution meet the goal?

  • A. Yes
  • B. No
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️
Ensure that files stored are larger, not smaller than 250MB.
You can have a separate compaction job that combines these files into larger ones.
Note: The file POSIX permissions and auditing in Data Lake Storage Gen1 comes with an overhead that becomes apparent when working with numerous small files. As a best practice, you must batch your data into larger files versus writing thousands or millions of small files to Data Lake Storage Gen1. Avoiding small file sizes can have multiple benefits, such as:
✑ Lowering the authentication checks across multiple files
✑ Reduced open file connections
✑ Faster copying/replication
✑ Fewer files to process when updating Data Lake Storage Gen1 POSIX permissions
Reference:
https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-best-practices

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
chaoxes
Highly Voted 4 years, 6 months ago
Given answer B. No is correct. In POSIX-style model it is recommended to avoid small size files, due to following considerations: -Lowering the authentication checks across multiple files -Reduced open file connections -Faster copying/replication -Fewer files to process when updating Data Lake Storage Gen1 POSIX permissions It is recommended that size of files are at least 256 MB. Source: https://docs.microsoft.com/pl-pl/azure/data-lake-store/data-lake-store-best-practices
upvoted 8 times
...
Deepu1987
Most Recent 4 years, 4 months ago
Please check this in the mentioned link https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-best-practices It's under "Improve throughput with parallelism" Avoid Small file sizes. in this heading you need to look out for the below line "you can have a separate compaction job that combines these files into larger ones." which states the given stmt is false
upvoted 2 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...