B is the right answer as Option B is more typical for stream-static joins, as it provides a consistent static DataFrame snapshot for the entire job's duration. Option A might be suitable in specialized cases where you need real-time updates of the static DataFrame for each microbatch.
The explanation suggests the author would like the stream-static join to work in this way. However, it works as it does - see the first sentence in here: https://learn.microsoft.com/en-us/azure/databricks/transform/join#stream-static
Answer is A, When Azure Databricks processes a micro-batch of data in a stream-static join, the latest valid version of data from the static Delta table joins with the records present in the current micro-batch
from https://learn.microsoft.com/en-us/azure/databricks/structured-streaming/delta-lake
In a stream-static join, Spark treats the static Delta table as a constant snapshot at the time the streaming query starts. This means:
The static table is loaded once when the stream starts.
All micro-batches of the stream will join with that same version of the static table.
Any updates made to the static table after the stream starts will not be reflected in the join, unless the stream is restarted.
A stream-static join joins the latest valid version of a Delta table (the static data) to a data stream using a stateless join.
When Databricks processes a micro-batch of data in a stream-static join, the latest valid version of data from the static Delta table joins with the records present in the current micro-batch. Because the join is stateless, you do not need to configure watermarking and can process results with low latency. The data in the static Delta table used in the join should be slowly-changing.
This is straight from docs, "A stream-static join joins the latest valid version of a Delta table (the static data) to a data stream using a stateless join.
When Azure Databricks processes a micro-batch of data in a stream-static join, the latest valid version of data from the static Delta table joins with the records present in the current micro-batch. Because the join is stateless, you do not need to configure watermarking and can process results with low latency. The data in the static Delta table used in the join should be slowly-changing."
https://learn.microsoft.com/en-us/azure/databricks/transform/join#stream-static
A voting comment increases the vote count for the chosen answer by one.
Upvoting a comment with a selected answer will also increase the vote count towards that answer by one.
So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.
Eertyy
Highly Voted 1 year, 7 months agoarekm
4 months agohamzaKhribi
1 year, 5 months agoBrianNguyen95
Highly Voted 1 year, 8 months agoJoG1221
Most Recent 1 week, 6 days agoarekm
4 months agoSriramiyer92
4 months, 3 weeks agoSriramiyer92
4 months, 3 weeks agoakashdesarda
7 months agokz_data
1 year, 3 months agohamzaKhribi
1 year, 5 months agosturcu
1 year, 6 months agosagar21692
1 year, 7 months ago