exam questions

Exam AWS Certified Data Analytics - Specialty All Questions

View all questions & answers for the AWS Certified Data Analytics - Specialty exam

Exam AWS Certified Data Analytics - Specialty topic 1 question 124 discussion

A company wants to run analytics on its Elastic Load Balancing logs stored in Amazon S3. A data analyst needs to be able to query all data from a desired year, month, or day. The data analyst should also be able to query a subset of the columns. The company requires minimal operational overhead and the most cost- effective solution.
Which approach meets these requirements for optimizing and querying the log data?

  • A. Use an AWS Glue job nightly to transform new log files into .csv format and partition by year, month, and day. Use AWS Glue crawlers to detect new partitions. Use Amazon Athena to query data.
  • B. Launch a long-running Amazon EMR cluster that continuously transforms new log files from Amazon S3 into its Hadoop Distributed File System (HDFS) storage and partitions by year, month, and day. Use Apache Presto to query the optimized format.
  • C. Launch a transient Amazon EMR cluster nightly to transform new log files into Apache ORC format and partition by year, month, and day. Use Amazon Redshift Spectrum to query the data.
  • D. Use an AWS Glue job nightly to transform new log files into Apache Parquet format and partition by year, month, and day. Use AWS Glue crawlers to detect new partitions. Use Amazon Athena to query data.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
srinivasa
Highly Voted 3 years, 7 months ago
Answer: D
upvoted 20 times
...
Thiya
Highly Voted 3 years, 5 months ago
A - .csv format is not optimal B - long running EMR is not cost-effective and has operational over-head of cluster management C - Again EMR with running Redshift Cluster is not cost-effective So, Answer is Option D - low-cost and no operational over-head. Data Scanning cost by Athena can be minimized by partition pruning and subset of columns from parquet file.
upvoted 15 times
...
pk349
Most Recent 2 years ago
D: I passed the test
upvoted 3 times
...
CleverMonkey092
2 years, 1 month ago
answer is d
upvoted 1 times
...
Mirandaali
2 years, 3 months ago
Selected Answer: D
Agree D
upvoted 2 times
...
cloudlearnerhere
2 years, 6 months ago
Correct answer is D as the Glue job can be used to transform and partition the logs files. Athena can be used to query the day. Glue and Athena are cost-effective with low operational overhead. Parquet data format can help query on a subset of the columns Option A is wrong as the CSV format is not the optimal format for storing and querying data using Athena. Option B is wrong as the long-running Amazon EMR cluster would not be a cost-effective option. Option C is wrong as using EMR and Redshift would not be as cost-effective or reduce operational cost as compared to Athena and Glue. https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/
upvoted 3 times
...
rocky48
2 years, 9 months ago
Selected Answer: D
Selected Answer: D
upvoted 1 times
...
Ramshizzle
2 years, 10 months ago
Selected Answer: D
D as my other comment explains
upvoted 1 times
...
Ramshizzle
2 years, 10 months ago
Answer D is obvious. A: we don't want csv. we want ORC or Parquet B: Long-running EMR cluster is expensive. Storing the data in HDFS is expensive C: Can work. But Using Redshift Spectrum is only logical if we want to combine the data with other data in Redshift. D: This is optimal. Glue works well. Parquet with Y/M/D partitions is optimal. Athena to query the data is perfect!
upvoted 2 times
...
awsmani
3 years, 5 months ago
Very tricky question, Option C has some operational over-head but D is AWS glue is server less, considering that I might go for D
upvoted 3 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago