Exam Certified Data Engineer Professional All Questions

View all questions & answers for the Certified Data Engineer Professional exam

Go to Exam

Exam Certified Data Engineer Professional topic 1 question 18 discussion

Actual exam question from Databricks's Certified Data Engineer Professional

Question #: 18
Topic #: 1

[All Certified Data Engineer Professional Questions]

Which statement regarding stream-static joins and static Delta tables is correct?

A. Each microbatch of a stream-static join will use the most recent version of the static Delta table as of each microbatch.
B. Each microbatch of a stream-static join will use the most recent version of the static Delta table as of the job's initialization.
C. The checkpoint directory will be used to track state information for the unique keys present in the join.
D. Stream-static joins cannot use static Delta tables because of consistency issues.
E. The checkpoint directory will be used to track updates to the static Delta table.

Show Suggested Answer

Suggested Answer: A 🗳️

by BrianNguyen95 at Aug. 17, 2023, 2:05 p.m.

Comments

Submit Cancel

Eertyy

Highly Voted 1 year, 9 months ago

B is the right answer as Option B is more typical for stream-static joins, as it provides a consistent static DataFrame snapshot for the entire job's duration. Option A might be suitable in specialized cases where you need real-time updates of the static DataFrame for each microbatch.

upvoted 12 times

arekm

6 months ago

The explanation suggests the author would like the stream-static join to work in this way. However, it works as it does - see the first sentence in here: https://learn.microsoft.com/en-us/azure/databricks/transform/join#stream-static

upvoted 3 times

...

hamzaKhribi

1 year, 7 months ago

Answer is A, When Azure Databricks processes a micro-batch of data in a stream-static join, the latest valid version of data from the static Delta table joins with the records present in the current micro-batch from https://learn.microsoft.com/en-us/azure/databricks/structured-streaming/delta-lake

upvoted 13 times

...

BrianNguyen95

Highly Voted 1 year, 10 months ago

correct answer is A

upvoted 6 times

...

KadELbied

Most Recent 1 month, 3 weeks ago

Selected Answer: A

Suretly A

upvoted 1 times

...

JoG1221

2 months, 1 week ago

Selected Answer: B

In a stream-static join, Spark treats the static Delta table as a constant snapshot at the time the streaming query starts. This means: The static table is loaded once when the stream starts. All micro-batches of the stream will join with that same version of the static table. Any updates made to the static table after the stream starts will not be reflected in the join, unless the stream is restarted.

upvoted 1 times

...

arekm

6 months ago

Selected Answer: A

A is correct, see: https://learn.microsoft.com/en-us/azure/databricks/transform/join#stream-static

upvoted 3 times

...

Sriramiyer92

6 months, 2 weeks ago

Selected Answer: B

A stream-static join joins the latest valid version of a Delta table (the static data) to a data stream using a stateless join. When Databricks processes a micro-batch of data in a stream-static join, the latest valid version of data from the static Delta table joins with the records present in the current micro-batch. Because the join is stateless, you do not need to configure watermarking and can process results with low latency. The data in the static Delta table used in the join should be slowly-changing.

upvoted 1 times

...

Sriramiyer92

6 months, 2 weeks ago

Selected Answer: A

https://docs.databricks.com/en/transform/join.html#stream-static

upvoted 1 times

...

akashdesarda

9 months ago

Selected Answer: A

This is straight from docs, "A stream-static join joins the latest valid version of a Delta table (the static data) to a data stream using a stateless join. When Azure Databricks processes a micro-batch of data in a stream-static join, the latest valid version of data from the static Delta table joins with the records present in the current micro-batch. Because the join is stateless, you do not need to configure watermarking and can process results with low latency. The data in the static Delta table used in the join should be slowly-changing." https://learn.microsoft.com/en-us/azure/databricks/transform/join#stream-static

upvoted 1 times

...