An upstream system is emitting change data capture (CDC) logs that are being written to a cloud object storage directory. Each record in the log indicates the change type (insert, update, or delete) and the values for each field after the change. The source table has a primary key identified by the field pk_id.
For auditing purposes, the data governance team wishes to maintain a full record of all values that have ever been valid in the source system. For analytical purposes, only the most recent value for each record needs to be recorded. The Databricks job to ingest these records occurs once per hour, but each individual record may have changed multiple times over the course of an hour.
Which solution meets these requirements?
Eertyy
Highly Voted 1 year, 7 months agofabiospont
1 month, 4 weeks agoEertyy
1 year, 6 months agoarekm
3 months agoSriramiyer92
3 months agoStarvosxant
1 year, 5 months agoimatheushenrique
Highly Voted 10 months agoTedet
Most Recent 1 month agofabiospont
1 month, 4 weeks agomwynn
2 months, 3 weeks agoarekm
3 months agoarekm
3 months agobenni_ale
4 months, 2 weeks agobenni_ale
4 months, 2 weeks agobenni_ale
5 months, 2 weeks agodatabrick_work
6 months, 2 weeks agospaceexplorer
1 year, 2 months agoRafaelCFC
1 year, 2 months agokz_data
1 year, 2 months agoa560fe1
1 year, 2 months agohamzaKhribi
1 year, 4 months agospaceexplorer
1 year, 2 months agosturcu
1 year, 5 months ago