Exam AWS Certified Machine Learning - Specialty All Questions

View all questions & answers for the AWS Certified Machine Learning - Specialty exam

Exam AWS Certified Machine Learning - Specialty topic 1 question 31 discussion

Exam question from Amazon's AWS Certified Machine Learning - Specialty

Question #: 31
Topic #: 1

[All AWS Certified Machine Learning - Specialty Questions]

A monitoring service generates 1 TB of scale metrics record data every minute. A Research team performs queries on this data using Amazon Athena. The queries run slowly due to the large volume of data, and the team requires better performance.
How should the records be stored in Amazon S3 to improve query performance?

A. CSV files
B. Parquet files
C. Compressed JSON
D. RecordIO

Show Suggested Answer

Suggested Answer: B 🗳️

by gaku1016 at Feb. 24, 2020, 3:28 p.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

gaku1016

Highly Voted 2 years, 11 months ago

Answer is B. Athena is best in Parquet format.

upvoted 24 times

...

emailtorajivk

Highly Voted 2 years, 10 months ago

You can improve the performance of your query by compressing, partitioning, or converting your data into columnar formats. Amazon Athena supports open source columnar data formats such as Apache Parquet and Apache ORC. Converting your data into a compressed, columnar format lowers your cost and improves query performance by enabling Athena to scan less data from S3 when executing your query

upvoted 14 times

...

JonSno

Most Recent 6 months ago

Selected Answer: B

Amazon Athena performs best when querying columnar storage formats like Apache Parquet. Given that 1 TB of data is generated every minute, optimizing storage format is critical for query performance and cost efficiency. Why Parquet (B) is the Best Choice? Columnar Storage: Parquet stores data by columns instead of rows, allowing Athena to scan only the needed columns, reducing the amount of data read. Compression Efficiency: Parquet automatically compresses data more efficiently than CSV or JSON. Smaller file sizes = Faster queries + Lower costs. Efficient Query Performance: Parquet supports predicate pushdown, meaning queries can skip irrelevant rows without scanning the entire dataset. Optimized for Big Data & Athena: Designed for big data workloads in Athena, Redshift Spectrum, and Presto. Works well with S3 partitioning to improve query speed.

upvoted 3 times

...

loict

11 months, 1 week ago

Selected Answer: B

A. NO - slower B. YES - Parquet native in Aethena/Presto C. NO - Compressed JSON D. NO - no built-in support

upvoted 2 times

...

teka112233

1 year ago

Selected Answer: B

according to: https://dzone.com/articles/how-to-be-a-hero-with-powerful-parquet-google-and the query run time over parquet file was 6.78 seconds while it was 236 seconds on the same data but stored on csv file which mean that parquet file is 34x faster than csv file

upvoted 1 times

...

apprehensive_scar

2 years, 6 months ago

Selected Answer: B

B it is

upvoted 3 times

...

benson2021

2 years, 10 months ago

Answer is B. https://aws.amazon.com/tw/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/ But why does this question relate to Machine Learning?

upvoted 3 times

AddiWei

2 years, 6 months ago

Because you must explore data very quickly using SQL in order to run EDA / analyze data for ML purposes. Those explorations can inform on selecting features that can be used for modeling purposes.

upvoted 5 times

...