exam questions

Exam AWS Certified Machine Learning - Specialty All Questions

View all questions & answers for the AWS Certified Machine Learning - Specialty exam

Exam AWS Certified Machine Learning - Specialty topic 1 question 31 discussion

A monitoring service generates 1 TB of scale metrics record data every minute. A Research team performs queries on this data using Amazon Athena. The queries run slowly due to the large volume of data, and the team requires better performance.
How should the records be stored in Amazon S3 to improve query performance?

  • A. CSV files
  • B. Parquet files
  • C. Compressed JSON
  • D. RecordIO
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
gaku1016
Highly Voted 2 years, 7 months ago
Answer is B. Athena is best in Parquet format.
upvoted 23 times
...
emailtorajivk
Highly Voted 2 years, 6 months ago
You can improve the performance of your query by compressing, partitioning, or converting your data into columnar formats. Amazon Athena supports open source columnar data formats such as Apache Parquet and Apache ORC. Converting your data into a compressed, columnar format lowers your cost and improves query performance by enabling Athena to scan less data from S3 when executing your query
upvoted 13 times
...
JonSno
Most Recent 2 months, 2 weeks ago
Selected Answer: B
Amazon Athena performs best when querying columnar storage formats like Apache Parquet. Given that 1 TB of data is generated every minute, optimizing storage format is critical for query performance and cost efficiency. Why Parquet (B) is the Best Choice? Columnar Storage: Parquet stores data by columns instead of rows, allowing Athena to scan only the needed columns, reducing the amount of data read. Compression Efficiency: Parquet automatically compresses data more efficiently than CSV or JSON. Smaller file sizes = Faster queries + Lower costs. Efficient Query Performance: Parquet supports predicate pushdown, meaning queries can skip irrelevant rows without scanning the entire dataset. Optimized for Big Data & Athena: Designed for big data workloads in Athena, Redshift Spectrum, and Presto. Works well with S3 partitioning to improve query speed.
upvoted 2 times
...
loict
7 months, 3 weeks ago
Selected Answer: B
A. NO - slower B. YES - Parquet native in Aethena/Presto C. NO - Compressed JSON D. NO - no built-in support
upvoted 2 times
...
teka112233
8 months, 2 weeks ago
Selected Answer: B
according to: https://dzone.com/articles/how-to-be-a-hero-with-powerful-parquet-google-and the query run time over parquet file was 6.78 seconds while it was 236 seconds on the same data but stored on csv file which mean that parquet file is 34x faster than csv file
upvoted 1 times
...
apprehensive_scar
2 years, 2 months ago
Selected Answer: B
B it is
upvoted 3 times
...
benson2021
2 years, 6 months ago
Answer is B. https://aws.amazon.com/tw/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/ But why does this question relate to Machine Learning?
upvoted 3 times
AddiWei
2 years, 2 months ago
Because you must explore data very quickly using SQL in order to run EDA / analyze data for ML purposes. Those explorations can inform on selecting features that can be used for modeling purposes.
upvoted 5 times
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago