Exam DP-201 topic 19 question 3 discussion

Actual exam question from Microsoft's DP-201

Question #: 3
Topic #: 19

[All DP-201 Questions]

You need to recommend the appropriate storage and processing solution?
What should you recommend?

A. Enable auto-shrink on the database.
B. Flush the blob cache using Windows PowerShell.
C. Enable Apache Spark RDD (RDD) caching.
D. Enable Databricks IO (DBIO) caching.
E. Configure the reading speed using Azure Data Studio.

Show Suggested Answer

Suggested Answer: C 🗳️
Scenario: You must be able to use a file system view of data stored in a blob. You must build an architecture that will allow Contoso to use the DB FS filesystem layer over a blob store.
Databricks File System (DBFS) is a distributed file system installed on Azure Databricks clusters. Files in DBFS persist to Azure Blob storage, so you won't lose data even after you terminate a cluster.
The Databricks Delta cache, previously named Databricks IO (DBIO) caching, accelerates data reads by creating copies of remote files in nodes' local storage using a fast intermediate data format. The data is cached automatically whenever a file has to be fetched from a remote location. Successive reads of the same data are then performed locally, which results in significantly improved reading speed.
Reference:
https://docs.databricks.com/delta/delta-cache.html#delta-cache
Design Azure data storage solutions

by STH at Nov. 19, 2019, 6:42 p.m.

Comments

Submit Cancel

STH

Highly Voted 5 years, 6 months ago

Answer is D, not C

upvoted 44 times

...

runningman

Highly Voted 5 years, 1 month ago

The entire explanation supports D, answer should be D. Dittos explanation below does not eliminate D, right?

upvoted 13 times

...

davita8

Most Recent 4 years, 1 month ago

D. Enable Databricks IO (DBIO) caching.

upvoted 2 times

...

mohowzeh

4 years, 4 months ago

Databricks IO cache, now Delta cache, is used in the context of a delta lake, which is not the case here. Apache RDD caching is to keep datasets in memory, which seems more fit to purpose? In the end I do not know for sure. What I do know, is that this business case and its questions excel in vagueness and inaccuracy of wording.

upvoted 2 times

...

BungyTex

4 years, 5 months ago

Answer ticked is C, but then the explanation below talks about D.

upvoted 1 times

...

syu31svc

4 years, 6 months ago

"You must build an architecture that will allow Contoso to use the DB FS filesystem layer over a blob store" Answer is D for sure

upvoted 2 times

...

ditto

5 years, 4 months ago

I think it's c because the different file formats acceptable. The Delta cache supports reading Parquet files in DBFS, Amazon S3, HDFS, Azure Blob storage, Azure Data Lake Storage Gen1, and Azure Data Lake Storage Gen2 (on Databricks Runtime 5.1 and above). It does not support other storage formats such as CSV, JSON, and ORC. https://docs.databricks.com/delta/optimizations/delta-cache.html#delta-and-rdd-cache-comparison

upvoted 3 times

...

Shir

5 years, 5 months ago

Correct, Answer here should be D, not C

upvoted 5 times

...