exam questions

Exam Certified Data Engineer Professional All Questions

View all questions & answers for the Certified Data Engineer Professional exam

Exam Certified Data Engineer Professional topic 1 question 212 discussion

Actual exam question from Databricks's Certified Data Engineer Professional
Question #: 212
Topic #: 1
[All Certified Data Engineer Professional Questions]

A team of data engineers are adding tables to a DLT pipeline that contain repetitive expectations for many of the same data quality checks. One member of the team suggests reusing these data quality rules across all tables defined for this pipeline.

What approach would allow them to do this?

  • A. Add data quality constraints to tables in this pipeline using an external job with access to pipeline configuration files.
  • B. Use global Python variables to make expectations visible across DLT notebooks included in the same pipeline.
  • C. Maintain data quality rules in a separate Databricks notebook that each DLT notebook or file can import as a library.
  • D. Maintain data quality rules in a Delta table outside of this pipeline's target schema, providing the schema name as a pipeline parameter.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
benni_ale
Highly Voted 8 months ago
Selected Answer: D
https://docs.databricks.com/en/delta-live-tables/expectations.html "You can maintain data quality rules separately from your pipeline implementations. Databricks recommends storing the rules in a Delta table with each rule categorized by a tag."
upvoted 7 times
...
Billybob0604
Most Recent 1 week ago
Selected Answer: C
The best practice for code reuse is write them once in a shared utility notebook
upvoted 1 times
...
RajeshMP2023
1 week, 3 days ago
Selected Answer: C
Reusability of Data Quality Rules: By maintaining the data quality rules in a separate notebook, the team can centralize the logic for expectations and reuse them across multiple tables and pipelines. This approach ensures consistency and reduces duplication of code. Importing as a Library: Databricks allows you to modularize code by creating reusable notebooks or Python files. These can be imported into other notebooks or DLT pipelines, making it easy to apply the same set of expectations across multiple tables.
upvoted 1 times
...
gloomy_marmot
1 week, 5 days ago
Selected Answer: D
https://docs.databricks.com/aws/en/dlt/expectation-patterns#portable-and-reusable-expectations Expectations should be stored in the table
upvoted 1 times
...
happyhelppy
2 weeks, 3 days ago
Selected Answer: C
D answer is confusing when it comes to use parameter as schema. Having expectations defined as python module and later imported is described in doc: https://docs.databricks.com/aws/en/dlt/expectation-patterns?language=Python%C2%A0Module#portable-and-reusable-expectations
upvoted 1 times
...
KadELbied
3 months, 1 week ago
Selected Answer: D
Suretly D
upvoted 1 times
...
lakime
4 months, 2 weeks ago
Selected Answer: C
Initially C, currently D
upvoted 1 times
...
arekm
7 months, 1 week ago
Selected Answer: D
D is what Databricks suggests as of now
upvoted 1 times
...
Thameur01
8 months ago
Selected Answer: C
To reuse repetitive data quality rules across multiple tables in a Delta Live Tables (DLT) pipeline, the most efficient approach is to maintain these rules in a separate notebook or Python module and import them where needed. This promotes code reusability, maintainability, and consistency
upvoted 2 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...