exam questions

Exam DP-201 All Questions

View all questions & answers for the DP-201 exam

Exam DP-201 topic 19 question 3 discussion

Actual exam question from Microsoft's DP-201
Question #: 3
Topic #: 19
[All DP-201 Questions]

You need to recommend the appropriate storage and processing solution?
What should you recommend?

  • A. Enable auto-shrink on the database.
  • B. Flush the blob cache using Windows PowerShell.
  • C. Enable Apache Spark RDD (RDD) caching.
  • D. Enable Databricks IO (DBIO) caching.
  • E. Configure the reading speed using Azure Data Studio.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️
Scenario: You must be able to use a file system view of data stored in a blob. You must build an architecture that will allow Contoso to use the DB FS filesystem layer over a blob store.
Databricks File System (DBFS) is a distributed file system installed on Azure Databricks clusters. Files in DBFS persist to Azure Blob storage, so you won't lose data even after you terminate a cluster.
The Databricks Delta cache, previously named Databricks IO (DBIO) caching, accelerates data reads by creating copies of remote files in nodes' local storage using a fast intermediate data format. The data is cached automatically whenever a file has to be fetched from a remote location. Successive reads of the same data are then performed locally, which results in significantly improved reading speed.
Reference:
https://docs.databricks.com/delta/delta-cache.html#delta-cache
Design Azure data storage solutions

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
STH
Highly Voted 5 years, 6 months ago
Answer is D, not C
upvoted 44 times
...
runningman
Highly Voted 5 years, 1 month ago
The entire explanation supports D, answer should be D. Dittos explanation below does not eliminate D, right?
upvoted 13 times
...
davita8
Most Recent 4 years, 1 month ago
D. Enable Databricks IO (DBIO) caching.
upvoted 2 times
...
mohowzeh
4 years, 4 months ago
Databricks IO cache, now Delta cache, is used in the context of a delta lake, which is not the case here. Apache RDD caching is to keep datasets in memory, which seems more fit to purpose? In the end I do not know for sure. What I do know, is that this business case and its questions excel in vagueness and inaccuracy of wording.
upvoted 2 times
...
BungyTex
4 years, 5 months ago
Answer ticked is C, but then the explanation below talks about D.
upvoted 1 times
...
syu31svc
4 years, 6 months ago
"You must build an architecture that will allow Contoso to use the DB FS filesystem layer over a blob store" Answer is D for sure
upvoted 2 times
...
ditto
5 years, 4 months ago
I think it's c because the different file formats acceptable. The Delta cache supports reading Parquet files in DBFS, Amazon S3, HDFS, Azure Blob storage, Azure Data Lake Storage Gen1, and Azure Data Lake Storage Gen2 (on Databricks Runtime 5.1 and above). It does not support other storage formats such as CSV, JSON, and ORC. https://docs.databricks.com/delta/optimizations/delta-cache.html#delta-and-rdd-cache-comparison
upvoted 3 times
...
Shir
5 years, 5 months ago
Correct, Answer here should be D, not C
upvoted 5 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...