Exam Certified Data Engineer Professional All Questions

View all questions & answers for the Certified Data Engineer Professional exam

Exam Certified Data Engineer Professional topic 1 question 212 discussion

Actual exam question from Databricks's Certified Data Engineer Professional

Question #: 212
Topic #: 1

[All Certified Data Engineer Professional Questions]

A team of data engineers are adding tables to a DLT pipeline that contain repetitive expectations for many of the same data quality checks. One member of the team suggests reusing these data quality rules across all tables defined for this pipeline.

What approach would allow them to do this?

A. Add data quality constraints to tables in this pipeline using an external job with access to pipeline configuration files.
B. Use global Python variables to make expectations visible across DLT notebooks included in the same pipeline.
C. Maintain data quality rules in a separate Databricks notebook that each DLT notebook or file can import as a library.
D. Maintain data quality rules in a Delta table outside of this pipeline's target schema, providing the schema name as a pipeline parameter.

Show Suggested Answer

Suggested Answer: D 🗳️

by benni_ale at Dec. 8, 2024, 6:17 p.m.

Comments

Submit Cancel

benni_ale

Highly Voted 8 months ago

Selected Answer: D

https://docs.databricks.com/en/delta-live-tables/expectations.html "You can maintain data quality rules separately from your pipeline implementations. Databricks recommends storing the rules in a Delta table with each rule categorized by a tag."

upvoted 7 times

...

Billybob0604

Most Recent 1 week ago

Selected Answer: C

The best practice for code reuse is write them once in a shared utility notebook

upvoted 1 times

...

RajeshMP2023

1 week, 3 days ago

Selected Answer: C

Reusability of Data Quality Rules: By maintaining the data quality rules in a separate notebook, the team can centralize the logic for expectations and reuse them across multiple tables and pipelines. This approach ensures consistency and reduces duplication of code. Importing as a Library: Databricks allows you to modularize code by creating reusable notebooks or Python files. These can be imported into other notebooks or DLT pipelines, making it easy to apply the same set of expectations across multiple tables.

upvoted 1 times

...

gloomy_marmot

1 week, 5 days ago

Selected Answer: D

https://docs.databricks.com/aws/en/dlt/expectation-patterns#portable-and-reusable-expectations Expectations should be stored in the table

upvoted 1 times

...

happyhelppy

2 weeks, 3 days ago

Selected Answer: C

D answer is confusing when it comes to use parameter as schema. Having expectations defined as python module and later imported is described in doc: https://docs.databricks.com/aws/en/dlt/expectation-patterns?language=Python%C2%A0Module#portable-and-reusable-expectations

upvoted 1 times

...

KadELbied

3 months, 1 week ago

Selected Answer: D

Suretly D

upvoted 1 times

...

lakime

4 months, 2 weeks ago

Selected Answer: C

Initially C, currently D

upvoted 1 times

...

arekm

7 months, 1 week ago

Selected Answer: D

D is what Databricks suggests as of now

upvoted 1 times

...

Thameur01

8 months ago

Selected Answer: C

To reuse repetitive data quality rules across multiple tables in a Delta Live Tables (DLT) pipeline, the most efficient approach is to maintain these rules in a separate notebook or Python module and import them where needed. This promotes code reusability, maintainability, and consistency

upvoted 2 times

...