exam questions

Exam Certified Data Engineer Professional All Questions

View all questions & answers for the Certified Data Engineer Professional exam

Exam Certified Data Engineer Professional topic 1 question 26 discussion

Actual exam question from Databricks's Certified Data Engineer Professional
Question #: 26
Topic #: 1
[All Certified Data Engineer Professional Questions]

Each configuration below is identical to the extent that each cluster has 400 GB total of RAM, 160 total cores and only one Executor per VM.
Given a job with at least one wide transformation, which of the following cluster configurations will result in maximum performance?

  • A. • Total VMs; 1
    • 400 GB per Executor
    • 160 Cores / Executor
  • B. • Total VMs: 8
    • 50 GB per Executor
    • 20 Cores / Executor
  • C. • Total VMs: 16
    • 25 GB per Executor
    • 10 Cores/Executor
  • D. • Total VMs: 4
    • 100 GB per Executor
    • 40 Cores/Executor
  • E. • Total VMs:2
    • 200 GB per Executor
    • 80 Cores / Executor
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
robson90
Highly Voted 1 year, 8 months ago
Option A, question is about maximum performance. Wide transformation will result in often expensive shuffle. With one executor this problem will be resolved. https://docs.databricks.com/en/clusters/cluster-config-best-practices.html#complex-batch-etl
upvoted 43 times
dp_learner
1 year, 6 months ago
source : https://docs.databricks.com/en/clusters/cluster-config-best-practices.html
upvoted 3 times
...
...
Ashok_Choudhary_CT
Most Recent 1 month ago
Selected Answer: C
How Option (C) Excels? ✅ More Executors (16 vs. 8 in Option B) → Faster parallel execution. ✅ Fewer Cores per Executor (10 vs. 20 in Option B) → Prevents CPU contention and scheduling delays. ✅ Better Memory Management (25GB vs. 50GB in Option B) → Reduces GC overhead. Final Verdict Option (C) is the "Best" configuration for handling a job with wide transformations.
upvoted 3 times
...
capt2101akash
1 month, 1 week ago
Selected Answer: A
The question talks about higher performance for one large wide transformation. This needs fewer large VMs/Executor. Therefore, one needs to choose the largest possible option.
upvoted 1 times
...
shaswat1404
2 months, 3 weeks ago
Selected Answer: C
overly large executors are bad due to large Garbage Collection (GC) overhead and inefficient parallelism option C provides the best balance of parallelism, memory utilization and performance efficiency
upvoted 1 times
...
fabiospont
2 months, 4 weeks ago
Selected Answer: A
A is correct only one VM per Job.
upvoted 1 times
...
HairyTorso
4 months ago
Selected Answer: B
Number of workers Choosing the right number of workers requires some trials and iterations to figure out the compute and memory needs of a Spark job. Here are some guidelines to help you start: Never choose a single worker for a production job, as it will be the single point for failure Start with 2-4 workers for small workloads (for example, a job with no wide transformations like joins and aggregations) Start with 8-10 workers for medium to big workloads that involve wide transformations like joins and aggregations, then scale up if necessary https://www.databricks.com/discover/pages/optimize-data-workloads-guide#number-workers
upvoted 4 times
...
arekm
4 months ago
Selected Answer: A
Maximum performance - A guarantees no shuffles between nodes in the cluster. Only processes on one VM.
upvoted 1 times
...
AlejandroU
4 months, 3 weeks ago
Selected Answer: B
Answer B offers a good balance with 8 executors, providing a decent amount of memory and cores per executor, allowing for significant parallel processing. Option C increases the number of executors further but at the cost of reduced memory and cores per executor, which might not be as effective for wide transformations.
upvoted 1 times
arekm
4 months ago
The question is about maximum performance.
upvoted 1 times
...
...
janeZ
4 months, 3 weeks ago
Selected Answer: C
for wide transformations, leveraging multiple executors typically results in better performance, resource utilization, and fault tolerance.
upvoted 2 times
...
Shakmak
5 months ago
Selected Answer: B
B is a correct Answer based on https://www.databricks.com/discover/pages/optimize-data-workloads-guide#all-purpose
upvoted 2 times
...
AndreFR
5 months, 3 weeks ago
Selected Answer: B
Besides that A & E do not provide enough parallelism & fault tolerance, I can't explain why, but the correct answer is B. I got the same question during the exam and got 100% at tooling with answer B. (B is the answer provided by other sites similar to examtopics) Choosing between B, C & D is tricky !
upvoted 3 times
Snakode
5 months ago
Exactly, Also how can one node will resolve shuffle issue
upvoted 1 times
Nicks_name
4 months, 3 weeks ago
VM != node
upvoted 1 times
...
...
...
kimberlyvsmith
5 months, 4 weeks ago
Selected Answer: B
B "Number of workers Choosing the right number of workers requires some trials and iterations to figure out the compute and memory needs of a Spark job. Here are some guidelines to help you start: Never choose a single worker for a production job, as it will be the single point for failure Start with 2-4 workers for small workloads (for example, a job with no wide transformations like joins and aggregations) Start with 8-10 workers for medium to big workloads that involve wide transformations like joins and aggregations, then scale up if necessary"
upvoted 3 times
benni_ale
5 months, 3 weeks ago
https://www.databricks.com/discover/pages/optimize-data-workloads-guide
upvoted 1 times
...
...
arik90
1 year, 1 month ago
Selected Answer: A
Wide transformation falls under complex etl which means Option A is correct in the documentation didn't mention to do otherwise in this scenario.
upvoted 1 times
...
PrashantTiwari
1 year, 2 months ago
A is correct
upvoted 1 times
...
vikrampatel5
1 year, 3 months ago
Selected Answer: A
Option A: https://docs.databricks.com/en/clusters/cluster-config-best-practices.html#complex-batch-etl
upvoted 3 times
...
RafaelCFC
1 year, 3 months ago
Selected Answer: A
robson90's response explains it perfectly and has documentation to support it.
upvoted 1 times
...
ofed
1 year, 5 months ago
Option A
upvoted 2 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago