A data ingestion task requires a one-TB JSON dataset to be written out to Parquet with a target part-file size of 512 MB. Because Parquet is being used instead of Delta Lake, built-in file-sizing features such as Auto-Optimize & Auto-Compaction cannot be used.
Which strategy will yield the best performance without shuffling data?
aragorn_brego
Highly Voted 1 year, 8 months agoDef21
Highly Voted 1 year, 6 months agoarekm
7 months, 1 week agocarlosmps
8 months agoazurefan777
9 months, 1 week agodalupus
Most Recent 3 weeks, 3 days agohappyhelppy
3 weeks, 6 days agoKadELbied
3 months, 2 weeks agoAlejandroU
7 months, 3 weeks agoAlejandroU
7 months, 3 weeks agotemple1305
8 months agonedlo
9 months, 2 weeks agosdas1
11 months agosdas1
11 months agosdas1
11 months agosdas1
11 months agovikram12apr
1 year, 5 months agohal2401me
1 year, 5 months agohal2401me
1 year, 5 months agoCurious76
1 year, 5 months agovctrhugo
1 year, 6 months agoadenis
1 year, 6 months agospaceexplorer
1 year, 6 months agodivingbell17
1 year, 7 months ago911land
1 year, 7 months ago