exam questions

Exam DP-203 All Questions

View all questions & answers for the DP-203 exam

Exam DP-203 topic 4 question 71 discussion

Actual exam question from Microsoft's DP-203
Question #: 71
Topic #: 4
[All DP-203 Questions]

You have an Azure subscription that contains an Azure Data Lake Storage Gen2 container named Container1 and an Azure Synapse Analytics workspace named Workspace1.

Workspace1 contains multiple Apache Spark jobs that reference a large dataset in Container1.

You need to optimize the run times of the jobs.

What should you do?

  • A. For Container1, disable hierarchical namespaces.
  • B. Cache the dataset.
  • C. Increase the spark.sql.autoBroadcastJoinThreshold value.
  • D. Use Resilient Distributed Datasets (RDDs).
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
imatheushenrique
1 month, 2 weeks ago
Selected Answer: B
B. Cache the dataset n Spark, caching is a mechanism for storing data in memory to speed up access to that data. When you cache a dataset, Spark keeps the data in memory so that it can be quickly retrieved the next time it is needed Workspace1 contains multiple Apache Spark jobs that reference a large dataset in Container1.
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago