exam questions

Exam AWS Certified Data Analytics - Specialty All Questions

View all questions & answers for the AWS Certified Data Analytics - Specialty exam

Exam AWS Certified Data Analytics - Specialty topic 1 question 23 discussion

A company wants to optimize the cost of its data and analytics platform. The company is ingesting a number of .csv and JSON files in Amazon S3 from various data sources. Incoming data is expected to be 50 GB each day. The company is using Amazon Athena to query the raw data in Amazon S3 directly. Most queries aggregate data from the past 12 months, and data that is older than 5 years is infrequently queried. The typical query scans about 500 MB of data and is expected to return results in less than 1 minute. The raw data must be retained indefinitely for compliance requirements.
Which solution meets the company's requirements?

  • A. Use an AWS Glue ETL job to compress, partition, and convert the data into a columnar data format. Use Athena to query the processed dataset. Configure a lifecycle policy to move the processed data into the Amazon S3 Standard-Infrequent Access (S3 Standard-IA) storage class 5 years after object creation. Configure a second lifecycle policy to move the raw data into Amazon S3 Glacier for long-term archival 7 days after object creation.
  • B. Use an AWS Glue ETL job to partition and convert the data into a row-based data format. Use Athena to query the processed dataset. Configure a lifecycle policy to move the data into the Amazon S3 Standard-Infrequent Access (S3 Standard-IA) storage class 5 years after object creation. Configure a second lifecycle policy to move the raw data into Amazon S3 Glacier for long-term archival 7 days after object creation.
  • C. Use an AWS Glue ETL job to compress, partition, and convert the data into a columnar data format. Use Athena to query the processed dataset. Configure a lifecycle policy to move the processed data into the Amazon S3 Standard-Infrequent Access (S3 Standard-IA) storage class 5 years after the object was last accessed. Configure a second lifecycle policy to move the raw data into Amazon S3 Glacier for long-term archival 7 days after the last date the object was accessed.
  • D. Use an AWS Glue ETL job to partition and convert the data into a row-based data format. Use Athena to query the processed dataset. Configure a lifecycle policy to move the data into the Amazon S3 Standard-Infrequent Access (S3 Standard-IA) storage class 5 years after the object was last accessed. Configure a second lifecycle policy to move the raw data into Amazon S3 Glacier for long-term archival 7 days after the last date the object was accessed.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
astalavista1
Highly Voted 3 years, 3 months ago
Selected Answer: A
Agree with answer A as C&D was eliminated due to the last accessed rather than created for Lifecycle policy. By compressing you save cost and converting to columnar data, performance is increased.
upvoted 11 times
...
cloudlearnerhere
Highly Voted 2 years, 8 months ago
Selected Answer: A
Correct answer is A as columnar data format store data efficiently by employing column-wise compression and enables split and parallel processing. Storing processed data in S3 in SA-IA and moving raw data in Glacier would help reduce costs. Option B & D is wrong as it is recommended to use columnar data format for processing. Options C is wrong as lifecycle rules are based on Object creation data and not last date when the object was accessed.
upvoted 6 times
...
GLam123
Most Recent 1 year, 8 months ago
Selected Answer: A
columnar and based on object creation time
upvoted 1 times
...
NikkyDicky
1 year, 11 months ago
Selected Answer: A
A make sense
upvoted 1 times
...
pk349
2 years, 2 months ago
A: I passed the test
upvoted 2 times
...
Arka_01
2 years, 10 months ago
Selected Answer: A
It should be based on object creation, not based on object access
upvoted 1 times
...
rocky48
3 years ago
Selected Answer: A
Answer is A
upvoted 1 times
...
ru4aws
3 years ago
Selected Answer: A
should be 5 years after object creation to Infrequent for processed data and 7 days after object creation to glacier for raw data There is no point of counting days from "Last accessed"
upvoted 2 times
...
dushmantha
3 years ago
Selected Answer: A
Columnar data is a way of optimizing (eleminate B, D). And the lifecycle policy should be assigned after object creation (eleminate C). Ans is A
upvoted 1 times
...
Bik000
3 years, 2 months ago
Selected Answer: A
Answer is A
upvoted 1 times
...
azi_2021
3 years, 3 months ago
ans should be A
upvoted 2 times
astalavista1
3 years, 3 months ago
Agree with answer A as C&D was eliminated due to the last accessed rather than created for Lifecycle policy. By compressing you save cost and converting to columnar data, performance is increased.
upvoted 1 times
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...