exam questions

Exam Certified Data Engineer Professional All Questions

View all questions & answers for the Certified Data Engineer Professional exam

Exam Certified Data Engineer Professional topic 1 question 29 discussion

Actual exam question from Databricks's Certified Data Engineer Professional
Question #: 29
Topic #: 1
[All Certified Data Engineer Professional Questions]

A new data engineer notices that a critical field was omitted from an application that writes its Kafka source to Delta Lake. This happened even though the critical field was in the Kafka source. That field was further missing from data written to dependent, long-term storage. The retention threshold on the Kafka service is seven days. The pipeline has been in production for three months.
Which describes how Delta Lake can help to avoid data loss of this nature in the future?

  • A. The Delta log and Structured Streaming checkpoints record the full history of the Kafka producer.
  • B. Delta Lake schema evolution can retroactively calculate the correct value for newly added fields, as long as the data was in the original source.
  • C. Delta Lake automatically checks that all fields present in the source data are included in the ingestion layer.
  • D. Data can never be permanently dropped or deleted from Delta Lake, so data loss is not possible under any circumstance.
  • E. Ingesting all raw data and metadata from Kafka to a bronze Delta table creates a permanent, replayable history of the data state.
Show Suggested Answer Hide Answer
Suggested Answer: E 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
KadELbied
1 month, 1 week ago
Selected Answer: E
select E
upvoted 1 times
...
kishanu
2 months, 1 week ago
Selected Answer: E
E is the right answer, as the table in bronze can be replayed again when required.
upvoted 1 times
...
Tedet
3 months, 2 weeks ago
Selected Answer: A
Considering the Databricks documentation on change feed and your need to process new records that have not been processed yet, Option A might actually be a better fit since you're looking for a streaming solution that can continuously monitor new records. The change feed (Option D) works for batch processing changes from a specific version, which isn't ideal for real-time streaming.
upvoted 1 times
...
HairyTorso
5 months, 2 weeks ago
Selected Answer: E
E lgtm
upvoted 1 times
...
Anithec0der
6 months, 1 week ago
Selected Answer: E
When we design pipeline, we will have to make sure data from source will be present there in the raw layer/bronze layer and the transformation we make should be done in refine and enterprise layer so by this way we can tackle this kind of situation where the necessary column was not replicated in previous runs of pipeline and we can create new column based on raw data we have.
upvoted 3 times
...
imatheushenrique
1 year ago
Medallion Architecture is named in E. (Ingesting all raw data and metadata from Kafka to a bronze Delta table creates a permanent, replayable history of the data state.)
upvoted 3 times
...
ojudz08
1 year, 4 months ago
Selected Answer: E
E is correct
upvoted 2 times
...
DAN_H
1 year, 4 months ago
Selected Answer: E
I think E is correct
upvoted 1 times
...
kz_data
1 year, 5 months ago
Selected Answer: E
I think E is correct
upvoted 1 times
...
alexvno
1 year, 7 months ago
Selected Answer: E
Looks good - E
upvoted 2 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...