Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.
exam questions

Exam Certified Data Engineer Professional All Questions

View all questions & answers for the Certified Data Engineer Professional exam

Exam Certified Data Engineer Professional topic 1 question 26 discussion

Actual exam question from Databricks's Certified Data Engineer Professional
Question #: 26
Topic #: 1
[All Certified Data Engineer Professional Questions]

Each configuration below is identical to the extent that each cluster has 400 GB total of RAM, 160 total cores and only one Executor per VM.
Given a job with at least one wide transformation, which of the following cluster configurations will result in maximum performance?

  • A. • Total VMs; 1
    • 400 GB per Executor
    • 160 Cores / Executor
  • B. • Total VMs: 8
    • 50 GB per Executor
    • 20 Cores / Executor
  • C. • Total VMs: 16
    • 25 GB per Executor
    • 10 Cores/Executor
  • D. • Total VMs: 4
    • 100 GB per Executor
    • 40 Cores/Executor
  • E. • Total VMs:2
    • 200 GB per Executor
    • 80 Cores / Executor
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
robson90
Highly Voted 1 year ago
Option A, question is about maximum performance. Wide transformation will result in often expensive shuffle. With one executor this problem will be resolved. https://docs.databricks.com/en/clusters/cluster-config-best-practices.html#complex-batch-etl
upvoted 39 times
dp_learner
10 months, 2 weeks ago
source : https://docs.databricks.com/en/clusters/cluster-config-best-practices.html
upvoted 3 times
...
...
arik90
Most Recent 5 months, 3 weeks ago
Selected Answer: A
Wide transformation falls under complex etl which means Option A is correct in the documentation didn't mention to do otherwise in this scenario.
upvoted 1 times
...
PrashantTiwari
7 months, 1 week ago
A is correct
upvoted 1 times
...
vikrampatel5
8 months ago
Selected Answer: A
Option A: https://docs.databricks.com/en/clusters/cluster-config-best-practices.html#complex-batch-etl
upvoted 2 times
...
RafaelCFC
8 months, 2 weeks ago
Selected Answer: A
robson90's response explains it perfectly and has documentation to support it.
upvoted 1 times
...
ofed
10 months, 2 weeks ago
Option A
upvoted 2 times
...
ismoshkov
10 months, 2 weeks ago
Selected Answer: A
Our goal is top performance. Vertical scaling is more performant rather that horizontal. Especially we know that we need cross VM exchange. Option A.
upvoted 2 times
...
dp_learner
10 months, 2 weeks ago
response A. as of Complex batch ETL " More complex ETL jobs, such as processing that requires unions and joins across multiple tables, will probably work best when you can minimize the amount of data shuffled. Since reducing the number of workers in a cluster will help minimize shuffles, you should consider a smaller cluster like cluster A in the following diagram over a larger cluster like cluster D. "
upvoted 1 times
dp_learner
10 months, 2 weeks ago
source = source : https://docs.databricks.com/en/clusters/cluster-config-best-practices.html
upvoted 1 times
...
...
Santitoxic
12 months ago
Selected Answer: D
Considering the need for both memory and parallelism, option D seems to offer the best balance between resources and parallel processing. It provides a reasonable amount of memory and cores per Executor while maintaining a sufficient level of parallelism with 4 Executors. This configuration is likely to result in maximum performance for a job with at least one wide transformation.
upvoted 4 times
...
mwyopme
1 year ago
Sorry Response C = 16VM for maximing Wide Transformation
upvoted 2 times
...
mwyopme
1 year ago
Key message is : Given a job with at least one wide transformation Performance, should max the number of concurrent VM, Selecting response B. 160/10 = 16 VM
upvoted 1 times
...
taif12340
1 year ago
Selected Answer: D
Considering the need for both memory and parallelism, option D seems to offer the best balance between resources and parallel processing. It provides a reasonable amount of memory and cores per Executor while maintaining a sufficient level of parallelism with 4 Executors. This configuration is likely to result in maximum performance for a job with at least one wide transformation.
upvoted 1 times
...
BrianNguyen95
1 year, 1 month ago
correct answer is E: Option E provides a substantial amount of memory and cores per executor, allowing the job to handle wide transformations efficiently. However, performance can also be influenced by factors like the nature of your specific workload, data distribution, and overall cluster utilization. It's a good practice to conduct benchmarking and performance testing with various configurations to determine the optimal setup for your specific use case.
upvoted 1 times
...
stuart_gta1
1 year, 1 month ago
C. More VMs helps to distribute the workload across the cluster, which results in better fault tolerance and increase the chances of job completion.
upvoted 2 times
...
asmayassineg
1 year, 1 month ago
answer should be E. if at least one transformation is wide, so 1 executor of 200GB can do the job, rest of tasks can be carried out on the other node
upvoted 1 times
8605246
1 year, 1 month ago
would it be fault-tolerant?
upvoted 1 times
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...