An upstream source writes Parquet data as hourly batches to directories named with the current date. A nightly batch job runs the following code to ingest all data from the previous day as indicated by the date variable:
Assume that the fields customer_id and order_id serve as a composite key to uniquely identify each order.
If the upstream system is known to occasionally produce duplicate entries for a single order hours apart, which statement is correct?
Eertyy
Highly Voted 1 year, 9 months agomeatpoof
4 months, 4 weeks agoKadELbied
Most Recent 1 month, 2 weeks agoarekm
5 months, 3 weeks agoAnithec0der
6 months, 2 weeks agobenni_ale
9 months agopanya
12 months agoimatheushenrique
1 year agoDavidRou
1 year, 3 months agoJay_98_11
1 year, 5 months ago5ffcd04
1 year, 5 months agokz_data
1 year, 6 months agovivekla
1 year, 6 months agosturcu
1 year, 8 months agoStarvosxant
1 year, 8 months agothxsgod
1 year, 9 months ago