A data ingestion task requires a one-TB JSON dataset to be written out to Parquet with a target part-file size of 512 MB. Because Parquet is being used instead of Delta Lake, built-in file-sizing features such as Auto-Optimize & Auto-Compaction cannot be used.
Which strategy will yield the best performance without shuffling data?
aragorn_brego
Highly Voted 1 year, 5 months agoDef21
Highly Voted 1 year, 3 months agoarekm
4 months agocarlosmps
4 months, 3 weeks agoazurefan777
5 months, 3 weeks agoKadELbied
Most Recent 2 days, 9 hours agoAlejandroU
4 months, 1 week agoAlejandroU
4 months, 1 week agotemple1305
4 months, 3 weeks agonedlo
6 months, 1 week agosdas1
7 months, 2 weeks agosdas1
7 months, 2 weeks agosdas1
7 months, 2 weeks agosdas1
7 months, 2 weeks agovikram12apr
1 year, 1 month agohal2401me
1 year, 1 month agohal2401me
1 year, 1 month agoCurious76
1 year, 2 months agovctrhugo
1 year, 2 months agoadenis
1 year, 3 months agospaceexplorer
1 year, 3 months agodivingbell17
1 year, 4 months ago911land
1 year, 4 months agoalexvno
1 year, 4 months agopetrv
1 year, 5 months ago