Exam Certified Data Engineer Professional topic 1 question 3 discussion

Actual exam question from Databricks's Certified Data Engineer Professional

Question #: 3
Topic #: 1

[All Certified Data Engineer Professional Questions]

When scheduling Structured Streaming jobs for production, which configuration automatically recovers from query failures and keeps costs low?

A. Cluster: New Job Cluster;
Retries: Unlimited;
Maximum Concurrent Runs: Unlimited
B. Cluster: New Job Cluster;
Retries: None;
Maximum Concurrent Runs: 1
C. Cluster: Existing All-Purpose Cluster;
Retries: Unlimited;
Maximum Concurrent Runs: 1
D. Cluster: New Job Cluster;
Retries: Unlimited;
Maximum Concurrent Runs: 1
E. Cluster: Existing All-Purpose Cluster;
Retries: None;
Maximum Concurrent Runs: 1

Show Suggested Answer

Suggested Answer: D 🗳️

by 8605246 at Aug. 5, 2023, 7:43 a.m.

Comments

Submit Cancel

8605246

Highly Voted 2 years ago

the answer given is correct: Maximum concurrent runs: Set to 1. There must be only one instance of each query concurrently active. Retries: Set to Unlimited. https://docs.databricks.com/en/structured-streaming/query-recovery.html

upvoted 11 times

...

363c4c5

Most Recent 1 month ago

Selected Answer: D

New Job Cluster: Using a new job cluster ensures that the compute resources are appropriately sized and dedicated to the job, which can help in managing costs and performance more effectively than using an existing all-purpose cluster. Retries: Unlimited: Setting retries to unlimited ensures that the job will automatically recover from failures by retrying until it succeeds. Maximum Concurrent Runs: 1: Limiting the maximum concurrent runs to 1 prevents multiple instances of the job from running simultaneously, which can help in controlling costs and avoiding resource contention. Databricks recommends using jobs compute instead of all-purpose compute when scheduling workflows, as it helps in managing resources more efficiently and reduces costs. https://learn.microsoft.com/en-us/azure/databricks/structured-streaming/production https://learn.microsoft.com/en-us/azure/databricks/jobs/continuous

upvoted 1 times

...

79f0e18

1 month, 1 week ago

Selected Answer: D

When running Structured Streaming jobs in production, you want: Automatic failure recovery → Requires setting Retries: Unlimited Efficient cost control → Use a New Job Cluster, which auto-terminates after job completion Concurrency control → Maximum Concurrent Runs: 1 prevents overlapping runs, which can corrupt streaming state or double-process data

upvoted 1 times

...

KadELbied

3 months, 1 week ago

Selected Answer: D

Suretly d

upvoted 1 times

...

codebender

4 months, 1 week ago

Selected Answer: D

Cant be all purpose general compute

upvoted 1 times

...

EelkeV

6 months, 1 week ago

Selected Answer: D

Job cluster autoterminates, and you want retries for recover

upvoted 1 times

...

akashdesarda

10 months, 2 weeks ago

Selected Answer: D

Use databricks jobs as it as native integration with Streaming use case. See the example Job here https://docs.databricks.com/en/structured-streaming/query-recovery.html#configure-structured-streaming-jobs-to-restart-streaming-queries-on-failure

upvoted 2 times

...