Exam Certified Data Engineer Professional topic 1 question 51 discussion

Actual exam question from Databricks's Certified Data Engineer Professional

Question #: 51
Topic #: 1

[All Certified Data Engineer Professional Questions]

Where in the Spark UI can one diagnose a performance problem induced by not leveraging predicate push-down?

A. In the Executor’s log file, by grepping for "predicate push-down"
B. In the Stage’s Detail screen, in the Completed Stages table, by noting the size of data read from the Input column
C. In the Storage Detail screen, by noting which RDDs are not stored on disk
D. In the Delta Lake transaction log. by noting the column statistics
E. In the Query Detail screen, by interpreting the Physical Plan

Show Suggested Answer

Suggested Answer: E 🗳️

by P1314 at Feb. 7, 2024, 10:16 a.m.

Comments

Submit Cancel

KadELbied

2 months, 1 week ago

Selected Answer: E

suretly E

upvoted 1 times

...

Predicate push-down is an optimization where conditions (such as filters) are pushed as close to the data source as possible (often to the database or file system level), reducing the amount of data read and processed. If predicate push-down isn't being leveraged, it can result in reading unnecessary data, leading to performance degradation. Execute a query --> Click View and go to Spark UI --> Navigate to SQL/DataFrame tab in SparkUI --> Click on any stage --> Navigate to details to find Physical Plan

upvoted 1 times

...

shaswat1404

5 months, 1 week ago

Selected Answer: B

when predicated pushdown is working properly, the amount of data read should be much lower because the data source is able to filter out the rows at read time based on the query predicates. if predicate pushdown is not levaraged, stages might read a much larger volume of data than necessary, which can be observed in the input column in the stage detail screen therefore B is the correct option not A : executor logs might contain some information, but they are niot the most direct way to assess predicate push-down performance not C : used to check RDD caching and persistence, not predicate push-down not D : it holds meta data and statistics but is not viewed via the spark UI for diagnosing query performance not E : while physical plan in the query detail screen might filter push-down, interpreting it requires more expertise, and the metric on the input data size(option B) is more straight forward indicator.

upvoted 1 times

...

benni_ale

8 months, 2 weeks ago

Selected Answer: E

upvoted 1 times

...

dd1192d

9 months, 2 weeks ago

Selected Answer: E

E is correct : https://docs.datastax.com/en/dse/6.9/spark/predicate-push-down.html

upvoted 2 times

...

P1314

1 year, 5 months ago

Selected Answer: E

Query plan. Correct is E

upvoted 1 times

...

Exam Certified Data Engineer Professional All Questions

View all questions & answers for the Certified Data Engineer Professional exam

Exam Certified Data Engineer Professional topic 1 question 51 discussion

Comments

KadELbied

Tedet

shaswat1404

benni_ale

dd1192d

P1314