Exam DP-203 topic 4 question 71 discussion

Actual exam question from Microsoft's DP-203

Question #: 71
Topic #: 4

You have an Azure subscription that contains an Azure Data Lake Storage Gen2 container named Container1 and an Azure Synapse Analytics workspace named Workspace1.

Workspace1 contains multiple Apache Spark jobs that reference a large dataset in Container1.

You need to optimize the run times of the jobs.

What should you do?

A. For Container1, disable hierarchical namespaces.
B. Cache the dataset.
C. Increase the spark.sql.autoBroadcastJoinThreshold value.
D. Use Resilient Distributed Datasets (RDDs).

Show Suggested Answer

Suggested Answer: B 🗳️

by EnigmaOracle at Feb. 8, 2025, 12:39 a.m.

Comments

Submit Cancel

imatheushenrique

5 months ago

Selected Answer: B

B. Cache the dataset n Spark, caching is a mechanism for storing data in memory to speed up access to that data. When you cache a dataset, Spark keeps the data in memory so that it can be quickly retrieved the next time it is needed Workspace1 contains multiple Apache Spark jobs that reference a large dataset in Container1.

upvoted 1 times

...

Most Voted

Save Cancel

Loading ...