A data ingestion task requires a one-TB JSON dataset to be written out to Parquet with a target part-file size of 512 MB. Because Parquet is being used instead of Delta Lake, built-in file-sizing features such as Auto-Optimize & Auto-Compaction cannot be used.
Which strategy will yield the best performance without shuffling data?
arekm
Highly Voted 7 months, 1 week agosainandam
Most Recent 1 month, 2 weeks agoKadELbied
3 months, 2 weeks agoRandomForest
6 months, 4 weeks ago_lene_
7 months agotemple1305
8 months, 2 weeks agocf56faf
9 months agoJugiboss
9 months, 3 weeks agom79590530
9 months, 3 weeks agoColje
10 months agoarekm
7 months, 1 week agopk07
10 months, 3 weeks agoshaojunni
10 months, 3 weeks ago03355a2
1 year, 1 month agohpkr
1 year, 2 months agoFreyr
1 year, 2 months ago