A data ingestion task requires a one-TB JSON dataset to be written out to Parquet with a target part-file size of 512 MB. Because Parquet is being used instead of Delta Lake, built-in file-sizing features such as Auto-Optimize & Auto-Compaction cannot be used.
Which strategy will yield the best performance without shuffling data?
KadELbied
1 day, 5 hours agoRandomForest
3 months, 2 weeks ago_lene_
3 months, 2 weeks agoarekm
4 months agotemple1305
5 months agocf56faf
5 months, 3 weeks agoJugiboss
6 months, 1 week agom79590530
6 months, 2 weeks agoColje
6 months, 2 weeks agoarekm
4 months agopk07
7 months, 1 week agoshaojunni
7 months, 1 week ago03355a2
10 months, 1 week agohpkr
10 months, 3 weeks agoFreyr
11 months ago