Exam Certified Data Engineer Professional topic 1 question 29 discussion

Actual exam question from Databricks's Certified Data Engineer Professional

Question #: 29
Topic #: 1

[All Certified Data Engineer Professional Questions]

A new data engineer notices that a critical field was omitted from an application that writes its Kafka source to Delta Lake. This happened even though the critical field was in the Kafka source. That field was further missing from data written to dependent, long-term storage. The retention threshold on the Kafka service is seven days. The pipeline has been in production for three months.
Which describes how Delta Lake can help to avoid data loss of this nature in the future?

A. The Delta log and Structured Streaming checkpoints record the full history of the Kafka producer.
B. Delta Lake schema evolution can retroactively calculate the correct value for newly added fields, as long as the data was in the original source.
C. Delta Lake automatically checks that all fields present in the source data are included in the ingestion layer.
D. Data can never be permanently dropped or deleted from Delta Lake, so data loss is not possible under any circumstance.
E. Ingesting all raw data and metadata from Kafka to a bronze Delta table creates a permanent, replayable history of the data state.

Show Suggested Answer

Suggested Answer: E 🗳️

by alexvno at Nov. 5, 2023, 8:55 a.m.

Comments

Submit Cancel

KadELbied

2 months, 4 weeks ago

Selected Answer: E

select E

upvoted 1 times

...

kishanu

3 months, 4 weeks ago

Selected Answer: E

E is the right answer, as the table in bronze can be replayed again when required.

upvoted 1 times

...

Tedet

5 months ago

Selected Answer: A

Considering the Databricks documentation on change feed and your need to process new records that have not been processed yet, Option A might actually be a better fit since you're looking for a streaming solution that can continuously monitor new records. The change feed (Option D) works for batch processing changes from a specific version, which isn't ideal for real-time streaming.

upvoted 1 times

...

HairyTorso

7 months ago

Selected Answer: E

E lgtm

upvoted 1 times

...

Anithec0der

7 months, 3 weeks ago

Selected Answer: E

When we design pipeline, we will have to make sure data from source will be present there in the raw layer/bronze layer and the transformation we make should be done in refine and enterprise layer so by this way we can tackle this kind of situation where the necessary column was not replicated in previous runs of pipeline and we can create new column based on raw data we have.

upvoted 3 times

...

imatheushenrique

1 year, 2 months ago

Medallion Architecture is named in E. (Ingesting all raw data and metadata from Kafka to a bronze Delta table creates a permanent, replayable history of the data state.)

upvoted 3 times

...