exam questions

Exam Certified Data Engineer Associate All Questions

View all questions & answers for the Certified Data Engineer Associate exam

Exam Certified Data Engineer Associate topic 1 question 37 discussion

Actual exam question from Databricks's Certified Data Engineer Associate
Question #: 37
Topic #: 1
[All Certified Data Engineer Associate Questions]

A data engineer has a single-task Job that runs each morning before they begin working. After identifying an upstream data issue, they need to set up another task to run a new notebook prior to the original task.
Which of the following approaches can the data engineer use to set up the new task?

  • A. They can clone the existing task in the existing Job and update it to run the new notebook.
  • B. They can create a new task in the existing Job and then add it as a dependency of the original task.
  • C. They can create a new task in the existing Job and then add the original task as a dependency of the new task.
  • D. They can create a new job from scratch and add both tasks to run concurrently.
  • E. They can clone the existing task to a new Job and then edit it to run the new notebook.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Redwings538
Highly Voted 2 years ago
Selected Answer: B
It seems there is some confusion on what dependency means in this case. Option B is correct because adding the new task as a dependency of the original task means that the new task will run BEFORE the original task, which is the goal defined in the question.
upvoted 24 times
loyik65509
1 month, 1 week ago
This means the original task must run first before the new task starts. The original task will wait for the new task. This is the wrong order because we need the new task to run first to fix upstream data issues before the original task executes.
upvoted 2 times
...
...
Data_4ever
Highly Voted 2 years ago
Selected Answer: B
B is the right answer.
upvoted 15 times
...
Billybob0604
Most Recent 1 month, 1 week ago
Selected Answer: C
The new task should run before the original task, meaning the original task must depend on the new task
upvoted 3 times
...
pint414
2 months, 1 week ago
Selected Answer: B
B as the new task runs first
upvoted 1 times
...
avidlearner
2 months, 1 week ago
Selected Answer: C
I think the confusion here is because it mentions "as a dependency" which to my opinion means following. if we go by that wording C is the correct answer because we want the original task to be run after the new task.
upvoted 1 times
...
Usaha1
3 months, 3 weeks ago
Selected Answer: B
B because when we add a task which is supposed to run after previous task then dependency ("depends on") gets added to the second job, not the first job.
upvoted 1 times
...
rohitrc8521
3 months, 3 weeks ago
Selected Answer: C
Answer is C, folks Please pay solid attention to the wording. They deliberately have constructed the wordings of option B and C to confuse the audience.
upvoted 2 times
...
danishanis
3 months, 3 weeks ago
Selected Answer: C
I think the correct answer should be C and not B. Adding the new task as a dependency of the original task would mean that the original task runs first and then the new task runs. This is the opposite of what is desired in the question.
upvoted 3 times
...
brconejeros
4 months ago
Selected Answer: C
Basically because on the sentence we have a prior: "they need to set up another task to run a new notebook prior to the original task.". So, the correct answer is C
upvoted 1 times
...
Rifrif
4 months, 1 week ago
Selected Answer: B
the answer B as it need runs before start working
upvoted 1 times
...
sam_chalvet
4 months, 1 week ago
Selected Answer: B
B - Event without know anything about Databricks, answer B is how I would want to be able to handle this scenario, it makes the most sense.
upvoted 1 times
...
806e7d2
5 months, 1 week ago
Selected Answer: B
In Databricks Jobs, you can manage task dependencies within a single job. If you want to add a new task that needs to run before the original task due to an upstream issue, the appropriate approach would be to: Create a new task: This new task would run the notebook that addresses the upstream data issue. Add it as a dependency of the original task: By making the new task dependent on the original task, you ensure that the new task runs first, and only after its successful completion will the original task run. This approach ensures that the sequence of tasks is correctly managed in a single job, with dependencies explicitly defined.
upvoted 1 times
...
Colje
7 months ago
C. They can create a new task in the existing Job and then add the original task as a dependency of the new task. Why this is correct: In Databricks, you can set up a task dependency chain by adding a new task and specifying that the original task depends on the new one. This ensures that the new task will run first, followed by the original task.
upvoted 1 times
...
tangerine141
7 months, 1 week ago
Selected Answer: B
Both B and C involve dependencies between tasks, but the difference is in how the dependencies are structured: B: "They can create a new task in the existing Job and then add it as a dependency of the original task." In this case, the new task is added as a prerequisite (dependency) for the original task. This means the new task will run first, and once it's completed, the original task will run. C: "They can create a new task in the existing Job and then add the original task as a dependency of the new task." In this case, the original task is added as a dependency for the new task, meaning the new task will wait for the original task to finish before running. The correct answer is B: You want the new task (the one handling the upstream issue) to run before the original task, so it should be set as a dependency of the original task.
upvoted 1 times
...
Stefan94
7 months, 1 week ago
Selected Answer: B
B is correct as Redwings538 says
upvoted 1 times
...
CID2024
8 months ago
I think the Correct answer is C. Because as per the statement in the question "they need to set up another task to run a new notebook prior to the original task." i.e. original task should run AFTER the new task. So, By creating a new task in the existing job and setting the original task as a dependency of the new task, the data engineer ensures that the new notebook runs first, followed by the original task. This approach maintains the sequence of execution required to address the upstream data issue.
upvoted 2 times
...
9d4d68a
8 months, 1 week ago
Below is the info I am convinced after checking with AI..... Here's the break down the differences between options B and C: Option B: Create a new task in the existing Job and then add it as a dependency of the original task: Result: The new task will run after the original task. Option C: Create a new task in the existing Job and then add the original task as a dependency of the new task: Result: The new task will run before the original task. Summary: Option B: Original task → New task Option C: New task → Original task In your case, Option C is the correct choice because you need the new task to run first to resolve the upstream data issue before the original task executes.
upvoted 2 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago