A large grocery distributor receives daily depletion reports from the field in the form of gzip archives od CSV files uploaded to Amazon S3. The files range from 500MB to 5GB. These files are processed daily by an EMR job.
Recently it has been observed that the file sizes vary, and the EMR jobs take too long. The distributor needs to tune and optimize the data processing workflow with this limited information to improve the performance of the
EMR job.
Which recommendation should an administrator provide?
viduvivek
Highly Voted 3 years, 8 months agoRoyk2020
Most Recent 3 years, 7 months agoCorram
3 years, 7 months agonotcloudguru
3 years, 7 months agoemailtorajivk
3 years, 7 months agosan2020
3 years, 7 months agokalpanareddy
3 years, 7 months agoM2
3 years, 8 months agobigdatalearner
3 years, 8 months agobigdatalearner
3 years, 8 months agoexams
3 years, 8 months ago