exam questions

Exam Certified Data Engineer Professional All Questions

View all questions & answers for the Certified Data Engineer Professional exam

Exam Certified Data Engineer Professional topic 1 question 172 discussion

Actual exam question from Databricks's Certified Data Engineer Professional
Question #: 172
Topic #: 1
[All Certified Data Engineer Professional Questions]

The data engineer is using Spark's MEMORY_ONLY storage level.

Which indicators should the data engineer look for in the Spark UI's Storage tab to signal that a cached table is not performing optimally?

  • A. On Heap Memory Usage is within 75% of Off Heap Memory Usage
  • B. The RDD Block Name includes the “*” annotation signaling a failure to cache
  • C. Size on Disk is > 0
  • D. The number of Cached Partitions > the number of Spark Partitions
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
KadELbied
1 month, 2 weeks ago
Selected Answer: B
Correct Answer: B In the Spark UI's Storage tab, an indicator that a cached table is not performing optimally would be the presence of the _disk annotation in the RDD Block Name. This annotation indicates that some partitions of the cached data have been spilled to disk because there wasn't enough memory to hold them. This is suboptimal because accessing data from disk is much slower than from memory. The goal of caching is to keep data in memory for fast access, and a spill to disk means that this goal is not fully achieved.
upvoted 1 times
KadELbied
1 month, 2 weeks ago
sorry it's C
upvoted 1 times
...
...
RuiCarvalhoDEV
7 months ago
Selected Answer: C
is MEMORY_ONLY
upvoted 1 times
...
Hadiler
10 months, 3 weeks ago
Selected Answer: C
C is correct
upvoted 2 times
...
03355a2
11 months, 4 weeks ago
Selected Answer: C
It's simple, if MEMORY_ONLY is used, anything spilled to disk would indicate a problem.
upvoted 1 times
03355a2
11 months, 4 weeks ago
The RDD answer is incorrect for this question due to the fact that while this indicates a failure to cache, it is more specific to identifying individual blocks that failed to cache rather than providing a general signal of a suboptimal performance for the entire cached table.
upvoted 1 times
...
...
hpkr
1 year ago
Selected Answer: C
C is correct here
upvoted 2 times
...
Freyr
1 year ago
Selected Answer: B
Correct Answer: B Option B, is the most correct and relevant choice for an indicator that a cached table is not performing optimally in a MEMORY_ONLY scenario. If an RDD block includes a "?" annotation, it strongly suggests issues with caching, which would directly impact the performance and expected behavior of MEMORY_ONLY caching. This indication points to a failure to cache the data entirely in memory, which is what MEMORY_ONLY intends to do. Option C, could also be a relevant indicator in general caching scenarios (e.g., MEMORY_AND_DISK), but it contradicts the MEMORY_ONLY setting directly. Therefore, Option B is chosen based on the specific storage level described.
upvoted 1 times
Freyr
1 year ago
*THE CORRECT ANSWER IS: C* PLEASE IGNORE MY PREVIOUS ANSWER. Long story short, B is correct in the context of non-functional requirement, but the question is based in functional requirement, and sorry for the confusion.
upvoted 3 times
...
...
imatheushenrique
1 year ago
B. This annotation says that some partitions of the cached data have been spilled to disk because there wasn't enough memory to keep them.
upvoted 1 times
...
MDWPartners
1 year ago
I would say C
upvoted 2 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...