You've migrated a Hadoop job from an on-prem cluster to dataproc and GCS. Your Spark job is a complicated analytical workload that consists of many shuffling operations and initial data are parquet files (on average 200-400 MB size each). You see some degradation in performance after the migration to Dataproc, so you'd like to optimize for it. You need to keep in mind that your organization is very cost-sensitive, so you'd like to continue using Dataproc on preemptibles (with 2 non-preemptible workers only) for this workload.
What should you do?
rickywck
Highly Voted 4 years, 1 month agodiluvio
2 years, 6 months agoodacir
1 year, 4 months agoraf2121
2 years, 9 months agorr4444
1 year, 9 months agoraf2121
2 years, 9 months agozellck
1 year, 4 months agomadhu1171
Highly Voted 4 years, 1 month agojvg637
4 years, 1 month agoch3n6
3 years, 10 months agoVishalB
3 years, 9 months agoFARR
3 years, 8 months agophilli1011
Most Recent 2 months, 2 weeks agorocky48
4 months, 3 weeks agorocky48
4 months, 3 weeks agoMathew106
9 months agoNandhu95
1 year, 1 month agomidgoo
1 year, 1 month agoMathew106
9 months ago[Removed]
1 year, 2 months agomusumusu
1 year, 2 months agoPolyMoe
1 year, 2 months agoayush_1995
1 year, 2 months agoslade_wilson
1 year, 4 months agoodacir
1 year, 4 months agozellck
1 year, 4 months agosfsdeniso
1 year, 5 months agodish11dish
1 year, 5 months agopiotrpiskorski
1 year, 5 months agozellck
1 year, 4 months agogudiking
1 year, 5 months ago