exam questions

Exam Certified Data Engineer Professional All Questions

View all questions & answers for the Certified Data Engineer Professional exam

Exam Certified Data Engineer Professional topic 1 question 51 discussion

Actual exam question from Databricks's Certified Data Engineer Professional
Question #: 51
Topic #: 1
[All Certified Data Engineer Professional Questions]

Where in the Spark UI can one diagnose a performance problem induced by not leveraging predicate push-down?

  • A. In the Executor’s log file, by grepping for "predicate push-down"
  • B. In the Stage’s Detail screen, in the Completed Stages table, by noting the size of data read from the Input column
  • C. In the Storage Detail screen, by noting which RDDs are not stored on disk
  • D. In the Delta Lake transaction log. by noting the column statistics
  • E. In the Query Detail screen, by interpreting the Physical Plan
Show Suggested Answer Hide Answer
Suggested Answer: E 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Tedet
2 months ago
Selected Answer: E
Predicate push-down is an optimization where conditions (such as filters) are pushed as close to the data source as possible (often to the database or file system level), reducing the amount of data read and processed. If predicate push-down isn't being leveraged, it can result in reading unnecessary data, leading to performance degradation. Execute a query --> Click View and go to Spark UI --> Navigate to SQL/DataFrame tab in SparkUI --> Click on any stage --> Navigate to details to find Physical Plan
upvoted 1 times
...
shaswat1404
2 months, 3 weeks ago
Selected Answer: B
when predicated pushdown is working properly, the amount of data read should be much lower because the data source is able to filter out the rows at read time based on the query predicates. if predicate pushdown is not levaraged, stages might read a much larger volume of data than necessary, which can be observed in the input column in the stage detail screen therefore B is the correct option not A : executor logs might contain some information, but they are niot the most direct way to assess predicate push-down performance not C : used to check RDD caching and persistence, not predicate push-down not D : it holds meta data and statistics but is not viewed via the spark UI for diagnosing query performance not E : while physical plan in the query detail screen might filter push-down, interpreting it requires more expertise, and the metric on the input data size(option B) is more straight forward indicator.
upvoted 1 times
...
benni_ale
6 months ago
Selected Answer: E
E
upvoted 1 times
...
dd1192d
6 months, 4 weeks ago
Selected Answer: E
E is correct : https://docs.datastax.com/en/dse/6.9/spark/predicate-push-down.html
upvoted 2 times
...
P1314
1 year, 2 months ago
Selected Answer: E
Query plan. Correct is E
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago