exam questions

Exam AWS Certified Big Data - Specialty All Questions

View all questions & answers for the AWS Certified Big Data - Specialty exam

Exam AWS Certified Big Data - Specialty topic 1 question 7 discussion

Exam question from Amazon's AWS Certified Big Data - Specialty
Question #: 7
Topic #: 1
[All AWS Certified Big Data - Specialty Questions]

A large grocery distributor receives daily depletion reports from the field in the form of gzip archives od CSV files uploaded to Amazon S3. The files range from 500MB to 5GB. These files are processed daily by an EMR job.
Recently it has been observed that the file sizes vary, and the EMR jobs take too long. The distributor needs to tune and optimize the data processing workflow with this limited information to improve the performance of the
EMR job.
Which recommendation should an administrator provide?

  • A. Reduce the HDFS block size to increase the number of task processors.
  • B. Use bzip2 or Snappy rather than gzip for the archives.
  • C. Decompress the gzip archives and store the data as CSV files.
  • D. Use Avro rather than gzip for the archives.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
viduvivek
Highly Voted 3 years, 8 months ago
B looks to be the right answer. Notes: (*) GZIP compression uses more CPU resources than Snappy or LZO, but provides a higher compression ratio. GZip is often a good choice for cold data, which is accessed infrequently. (*) Snappy or LZO are a better choice for hot data, which is accessed frequently. (*) BZip2 can also produce more compression than GZip for some types of files, at the cost of some speed when compressing and decompressing. (*) For MapReduce, if you need your compressed data to be splittable, BZip2, LZO, and Snappy formats are splittable, but GZip is not. Refer : https://docs.cloudera.com/documentation/enterprise/5-3-x/topics/admin_data_compression_performance.html
upvoted 8 times
...
Royk2020
Most Recent 3 years, 7 months ago
Ans is D. A 5GB GZIP file will turn out be bigger when compressed with SNAPPY. And snappy is not splittable
upvoted 1 times
...
Corram
3 years, 7 months ago
D was my choice. Bzip2 is splittable, but snappy is not. So B seems odd. Avro is splittable, so this conversion should help.
upvoted 1 times
notcloudguru
3 years, 7 months ago
B is still valid: Bzip2 if splitting. Snappy if compressing.
upvoted 1 times
...
...
emailtorajivk
3 years, 7 months ago
For instance, if you are aggregating your data (using the ingest tool of your choice) and the aggregated data files are Large File HTTP Range Request: 64MB HTTP Range Request: 64MB HTTP Range Request: 64MB HTTP Range Request: 64MB Map Task Map Task Map Task Map Task EMR Cluster S3 Bucket Amazon Web Services – Best Practices for Amazon EMR August 2013 Page 16 of 38 between 500 MB to 1 GB, GZIP compression is an acceptable data compression type. However, if your data aggregation creates files larger than 1 GB, its best to pick a compression algorithm that supports splitting. https://d0.awsstatic.com/whitepapers/aws-amazon-emr-best-practices.pdf
upvoted 1 times
...
san2020
3 years, 7 months ago
my selection B
upvoted 2 times
...
kalpanareddy
3 years, 7 months ago
Answer is B https://d0.awsstatic.com/whitepapers/aws-amazon-emr-best-practices.pdf
upvoted 1 times
...
M2
3 years, 8 months ago
B looks correct. bzip2 splittable or snappy compress-decompress speed very fast
upvoted 3 times
...
bigdatalearner
3 years, 8 months ago
Snappy is not split table as well however it does compression and decompression quickly as compare to gzip so Answer would still be B.
upvoted 1 times
...
bigdatalearner
3 years, 8 months ago
B is right answer : reason Bzip2 and snappy can split files and gzip can't split
upvoted 3 times
...
exams
3 years, 8 months ago
B looks good
upvoted 3 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...