Exam Certified Data Engineer Professional All Questions

View all questions & answers for the Certified Data Engineer Professional exam

Exam Certified Data Engineer Professional topic 1 question 26 discussion

Actual exam question from Databricks's Certified Data Engineer Professional

Question #: 26
Topic #: 1

[All Certified Data Engineer Professional Questions]

Each configuration below is identical to the extent that each cluster has 400 GB total of RAM, 160 total cores and only one Executor per VM.
Given a job with at least one wide transformation, which of the following cluster configurations will result in maximum performance?

A. • Total VMs; 1
• 400 GB per Executor
• 160 Cores / Executor
B. • Total VMs: 8
• 50 GB per Executor
• 20 Cores / Executor
C. • Total VMs: 16
• 25 GB per Executor
• 10 Cores/Executor
D. • Total VMs: 4
• 100 GB per Executor
• 40 Cores/Executor
E. • Total VMs:2
• 200 GB per Executor
• 80 Cores / Executor

Show Suggested Answer

Suggested Answer: B 🗳️

by asmayassineg at Aug. 2, 2023, 12:34 p.m.

Comments

Submit Cancel

robson90

Highly Voted 1 year, 9 months ago

Option A, question is about maximum performance. Wide transformation will result in often expensive shuffle. With one executor this problem will be resolved. https://docs.databricks.com/en/clusters/cluster-config-best-practices.html#complex-batch-etl

upvoted 43 times

dp_learner

1 year, 7 months ago

source : https://docs.databricks.com/en/clusters/cluster-config-best-practices.html

upvoted 3 times

...

Ashok_Choudhary_CT

Highly Voted 2 months, 2 weeks ago

Selected Answer: C

How Option (C) Excels? ✅ More Executors (16 vs. 8 in Option B) → Faster parallel execution. ✅ Fewer Cores per Executor (10 vs. 20 in Option B) → Prevents CPU contention and scheduling delays. ✅ Better Memory Management (25GB vs. 50GB in Option B) → Reduces GC overhead. Final Verdict Option (C) is the "Best" configuration for handling a job with wide transformations.

upvoted 5 times

...

KadELbied

Most Recent 1 month, 1 week ago

Selected Answer: A

Suretly A

upvoted 1 times

...

capt2101akash

2 months, 3 weeks ago

Selected Answer: A

The question talks about higher performance for one large wide transformation. This needs fewer large VMs/Executor. Therefore, one needs to choose the largest possible option.

upvoted 1 times

...

shaswat1404

4 months, 1 week ago

Selected Answer: C

overly large executors are bad due to large Garbage Collection (GC) overhead and inefficient parallelism option C provides the best balance of parallelism, memory utilization and performance efficiency

upvoted 1 times

...

fabiospont

4 months, 2 weeks ago

Selected Answer: A

A is correct only one VM per Job.

upvoted 1 times

...

HairyTorso

5 months, 2 weeks ago

Selected Answer: B

Number of workers Choosing the right number of workers requires some trials and iterations to figure out the compute and memory needs of a Spark job. Here are some guidelines to help you start: Never choose a single worker for a production job, as it will be the single point for failure Start with 2-4 workers for small workloads (for example, a job with no wide transformations like joins and aggregations) Start with 8-10 workers for medium to big workloads that involve wide transformations like joins and aggregations, then scale up if necessary https://www.databricks.com/discover/pages/optimize-data-workloads-guide#number-workers

upvoted 4 times

...

arekm

5 months, 2 weeks ago

Selected Answer: A

Maximum performance - A guarantees no shuffles between nodes in the cluster. Only processes on one VM.

upvoted 1 times

...

AlejandroU

6 months, 1 week ago

Selected Answer: B

Answer B offers a good balance with 8 executors, providing a decent amount of memory and cores per executor, allowing for significant parallel processing. Option C increases the number of executors further but at the cost of reduced memory and cores per executor, which might not be as effective for wide transformations.

upvoted 1 times

arekm

5 months, 2 weeks ago

The question is about maximum performance.

upvoted 1 times

...

janeZ

6 months, 1 week ago

Selected Answer: C

for wide transformations, leveraging multiple executors typically results in better performance, resource utilization, and fault tolerance.

upvoted 2 times

...

Shakmak

6 months, 2 weeks ago

Selected Answer: B

B is a correct Answer based on https://www.databricks.com/discover/pages/optimize-data-workloads-guide#all-purpose

upvoted 2 times

...

AndreFR

7 months, 1 week ago

Selected Answer: B

Besides that A & E do not provide enough parallelism & fault tolerance, I can't explain why, but the correct answer is B. I got the same question during the exam and got 100% at tooling with answer B. (B is the answer provided by other sites similar to examtopics) Choosing between B, C & D is tricky !

upvoted 3 times

Snakode

6 months, 3 weeks ago

Exactly, Also how can one node will resolve shuffle issue

upvoted 1 times

Nicks_name

6 months, 1 week ago

VM != node

upvoted 1 times

...

kimberlyvsmith

7 months, 2 weeks ago

Selected Answer: B

B "Number of workers Choosing the right number of workers requires some trials and iterations to figure out the compute and memory needs of a Spark job. Here are some guidelines to help you start: Never choose a single worker for a production job, as it will be the single point for failure Start with 2-4 workers for small workloads (for example, a job with no wide transformations like joins and aggregations) Start with 8-10 workers for medium to big workloads that involve wide transformations like joins and aggregations, then scale up if necessary"

upvoted 3 times