exam questions

Exam Certified Data Engineer Professional All Questions

View all questions & answers for the Certified Data Engineer Professional exam

Exam Certified Data Engineer Professional topic 1 question 101 discussion

Actual exam question from Databricks's Certified Data Engineer Professional
Question #: 101
Topic #: 1
[All Certified Data Engineer Professional Questions]

Which indicators would you look for in the Spark UI’s Storage tab to signal that a cached table is not performing optimally? Assume you are using Spark’s MEMORY_ONLY storage level.

  • A. Size on Disk is < Size in Memory
  • B. The RDD Block Name includes the “*” annotation signaling a failure to cache
  • C. Size on Disk is > 0
  • D. The number of Cached Partitions > the number of Spark Partitions
  • E. On Heap Memory Usage is within 75% of Off Heap Memory Usage
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
vctrhugo
Highly Voted 1 year, 5 months ago
Selected Answer: C
C. Size on Disk is > 0 When using Spark's MEMORY_ONLY storage level, the ideal scenario is that the data is fully cached in memory, and the Size on Disk should be 0 (indicating that the data is not spilled to disk). If the Size on Disk is greater than 0, it suggests that some data has been spilled to disk, which can lead to degraded performance as reading from disk is slower than reading from memory.
upvoted 7 times
...
Billybob0604
Most Recent 3 days, 9 hours ago
Selected Answer: B
In the Spark UI’s Storage tab, when you're using the MEMORY_ONLY storage level, Spark tries to cache the RDD/table completely in memory. If a partition of the RDD does not fit into memory, Spark does not cache that partition and recomputes it when needed. Indicator: * in the Block Name A * (asterisk) next to an RDD block name in the Storage tab indicates that Spark failed to cache that block.
upvoted 1 times
...
gloomy_marmot
5 days, 20 hours ago
Selected Answer: B
In the Spark UI's Storage tab, an indicator that a cached table is not performing optimally would be the presence of the _disk annotation in the RDD Block Name. This annotation indicates that some partitions of the cached data have been spilled to disk because there wasn't enough memory to hold them. This is suboptimal because accessing data from disk is much slower than from memory. The goal of caching is to keep data in memory for fast access, and a spill to disk means that this goal is not fully achieved.
upvoted 1 times
...
79f0e18
1 month ago
Selected Answer: A
Under MEMORY_ONLY, Spark does not write to disk, so Size on Disk should be 0. Under MEMORY_ONLY, off-heap memory is not used In the Storage tab, an asterisk (*) next to the RDD block name (e.g., rdd_42_3*) indicates the partition could not be cached due to memory constraints
upvoted 1 times
gloomy_marmot
5 days, 20 hours ago
But it should be B
upvoted 1 times
...
...
KadELbied
2 months, 3 weeks ago
Selected Answer: C
suretly C
upvoted 1 times
...
benni_ale
7 months, 3 weeks ago
Selected Answer: C
I think is C
upvoted 1 times
...
Isio05
1 year, 1 month ago
Selected Answer: C
In this case any data on disk means that cache is not performing optimally
upvoted 2 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...