Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.

Unlimited Access

Get Unlimited Contributor Access to the all ExamTopics Exams!
Take advantage of PDF Files for 1000+ Exams along with community discussions and pass IT Certification Exams Easily.

Exam Certified Data Engineer Professional topic 1 question 21 discussion

Actual exam question from Databricks's Certified Data Engineer Professional
Question #: 21
Topic #: 1
[All Certified Data Engineer Professional Questions]

A Structured Streaming job deployed to production has been experiencing delays during peak hours of the day. At present, during normal execution, each microbatch of data is processed in less than 3 seconds. During peak hours of the day, execution time for each microbatch becomes very inconsistent, sometimes exceeding 30 seconds. The streaming write is currently configured with a trigger interval of 10 seconds.
Holding all other variables constant and assuming records need to be processed in less than 10 seconds, which adjustment will meet the requirement?

  • A. Decrease the trigger interval to 5 seconds; triggering batches more frequently allows idle executors to begin processing the next batch while longer running tasks from previous batches finish.
  • B. Increase the trigger interval to 30 seconds; setting the trigger interval near the maximum execution time observed for each batch is always best practice to ensure no records are dropped.
  • C. The trigger interval cannot be modified without modifying the checkpoint directory; to maintain the current stream state, increase the number of shuffle partitions to maximize parallelism.
  • D. Use the trigger once option and configure a Databricks job to execute the query every 10 seconds; this ensures all backlogged records are processed with each batch.
  • E. Decrease the trigger interval to 5 seconds; triggering batches more frequently may prevent records from backing up and large batches from causing spill.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
ofed
1 month ago
Only C. Even if you trigger more frequently you decrease both load and time for this load. E doesn't change anything.
upvoted 1 times
...
sturcu
2 months ago
Selected Answer: E
Changing trigger interval to "one" will cause this to be a "batch" and will not execute in microbranches. This will not help at all
upvoted 2 times
...
Eertyy
2 months, 3 weeks ago
correct answer is E
upvoted 1 times
...
azurearch
3 months ago
sorry, the caveat is holding all other variables constant.. that means we are not allowed to change trigger intervals. is C the answer then
upvoted 1 times
...
azurearch
3 months ago
what if in between those 5 seconds trigger interval if there are more records, that would still increase the time it takes to process.. i doubt E is correct. I will go with answer D. it is not to execute all queries within 10 secs. it is to execute trigger now batch every 10 seconds.
upvoted 1 times
...
azurearch
3 months ago
A option also is about setting trigger interval to 5 seconds, just to understand.. why its not the answer
upvoted 1 times
...
cotardo2077
3 months, 1 week ago
Selected Answer: E
for sure E
upvoted 2 times
...
Eertyy
3 months, 1 week ago
correct anwer is E
upvoted 2 times
...
asmayassineg
4 months, 1 week ago
correct answer is E. D means a job will need to acquire resources in 10s which is impossible without serverless
upvoted 3 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...