You've migrated a Hadoop job from an on-prem cluster to dataproc and GCS. Your Spark job is a complicated analytical workload that consists of many shuffling operations and initial data are parquet files (on average 200-400 MB size each). You see some degradation in performance after the migration to Dataproc, so you'd like to optimize for it. You need to keep in mind that your organization is very cost-sensitive, so you'd like to continue using Dataproc on preemptibles (with 2 non-preemptible workers only) for this workload.
What should you do?
rickywck
Highly Voted 5 years, 4 months agogrshankar9
6 months, 3 weeks agodiluvio
3 years, 10 months agoodacir
2 years, 8 months agoraf2121
4 years agorr4444
3 years, 1 month agoraf2121
4 years agomadhu1171
Highly Voted 5 years, 4 months agojvg637
5 years, 4 months agoch3n6
5 years, 1 month agoVishalB
5 years agoFARR
4 years, 11 months agoAdriHubert
Most Recent 2 months, 2 weeks agorajshiv
3 months, 3 weeks agorajshiv
3 months, 3 weeks agooussama7
4 months, 3 weeks agoParandhaman_Margan
4 months, 3 weeks agof74ca0c
7 months agoJavakidson
9 months, 1 week agoSamuelTsch
9 months, 2 weeks agobaimus
10 months, 2 weeks ago987af6b
1 year agophilli1011
1 year, 6 months agorocky48
1 year, 8 months agorocky48
1 year, 8 months agoMathew106
2 years agoNandhu95
2 years, 4 months agomidgoo
2 years, 5 months agoMathew106
2 years ago