Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.

Unlimited Access

Get Unlimited Contributor Access to the all ExamTopics Exams!
Take advantage of PDF Files for 1000+ Exams along with community discussions and pass IT Certification Exams Easily.

Exam Certified Data Engineer Professional topic 1 question 26 discussion

Actual exam question from Databricks's Certified Data Engineer Professional
Question #: 26
Topic #: 1
[All Certified Data Engineer Professional Questions]

Each configuration below is identical to the extent that each cluster has 400 GB total of RAM, 160 total cores and only one Executor per VM.
Given a job with at least one wide transformation, which of the following cluster configurations will result in maximum performance?

  • A. • Total VMs; 1
    • 400 GB per Executor
    • 160 Cores / Executor
  • B. • Total VMs: 8
    • 50 GB per Executor
    • 20 Cores / Executor
  • C. • Total VMs: 16
    • 25 GB per Executor
    • 10 Cores/Executor
  • D. • Total VMs: 4
    • 100 GB per Executor
    • 40 Cores/Executor
  • E. • Total VMs:2
    • 200 GB per Executor
    • 80 Cores / Executor
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
robson90
Highly Voted 3 months, 2 weeks ago
Option A, question is about maximum performance. Wide transformation will result in often expensive shuffle. With one executor this problem will be resolved. https://docs.databricks.com/en/clusters/cluster-config-best-practices.html#complex-batch-etl
upvoted 17 times
dp_learner
1 month, 1 week ago
source : https://docs.databricks.com/en/clusters/cluster-config-best-practices.html
upvoted 1 times
...
...
ofed
Most Recent 1 month ago
Option A
upvoted 1 times
...
ismoshkov
1 month, 1 week ago
Selected Answer: A
Our goal is top performance. Vertical scaling is more performant rather that horizontal. Especially we know that we need cross VM exchange. Option A.
upvoted 1 times
...
dp_learner
1 month, 1 week ago
response A. as of Complex batch ETL " More complex ETL jobs, such as processing that requires unions and joins across multiple tables, will probably work best when you can minimize the amount of data shuffled. Since reducing the number of workers in a cluster will help minimize shuffles, you should consider a smaller cluster like cluster A in the following diagram over a larger cluster like cluster D. "
upvoted 1 times
dp_learner
1 month, 1 week ago
source = source : https://docs.databricks.com/en/clusters/cluster-config-best-practices.html
upvoted 1 times
...
...
Santitoxic
2 months, 2 weeks ago
Selected Answer: D
Considering the need for both memory and parallelism, option D seems to offer the best balance between resources and parallel processing. It provides a reasonable amount of memory and cores per Executor while maintaining a sufficient level of parallelism with 4 Executors. This configuration is likely to result in maximum performance for a job with at least one wide transformation.
upvoted 2 times
...
mwyopme
2 months, 3 weeks ago
Sorry Response C = 16VM for maximing Wide Transformation
upvoted 1 times
...
mwyopme
2 months, 3 weeks ago
Key message is : Given a job with at least one wide transformation Performance, should max the number of concurrent VM, Selecting response B. 160/10 = 16 VM
upvoted 1 times
...
taif12340
3 months, 2 weeks ago
Selected Answer: D
Considering the need for both memory and parallelism, option D seems to offer the best balance between resources and parallel processing. It provides a reasonable amount of memory and cores per Executor while maintaining a sufficient level of parallelism with 4 Executors. This configuration is likely to result in maximum performance for a job with at least one wide transformation.
upvoted 1 times
...
BrianNguyen95
3 months, 3 weeks ago
correct answer is E: Option E provides a substantial amount of memory and cores per executor, allowing the job to handle wide transformations efficiently. However, performance can also be influenced by factors like the nature of your specific workload, data distribution, and overall cluster utilization. It's a good practice to conduct benchmarking and performance testing with various configurations to determine the optimal setup for your specific use case.
upvoted 1 times
...
stuart_gta1
4 months ago
C. More VMs helps to distribute the workload across the cluster, which results in better fault tolerance and increase the chances of job completion.
upvoted 1 times
...
asmayassineg
4 months, 1 week ago
answer should be E. if at least one transformation is wide, so 1 executor of 200GB can do the job, rest of tasks can be carried out on the other node
upvoted 1 times
8605246
4 months, 1 week ago
would it be fault-tolerant?
upvoted 1 times
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...