Exam AWS Certified Data Analytics - Specialty All Questions

View all questions & answers for the AWS Certified Data Analytics - Specialty exam

Exam AWS Certified Data Analytics - Specialty topic 1 question 45 discussion

Exam question from Amazon's AWS Certified Data Analytics - Specialty

Question #: 45
Topic #: 1

[All AWS Certified Data Analytics - Specialty Questions]

Once a month, a company receives a 100 MB .csv file compressed with gzip. The file contains 50,000 property listing records and is stored in Amazon S3 Glacier.
The company needs its data analyst to query a subset of the data for a specific vendor.
What is the most cost-effective solution?

A. Load the data into Amazon S3 and query it with Amazon S3 Select.
B. Query the data from Amazon S3 Glacier directly with Amazon Glacier Select.
C. Load the data to Amazon S3 and query it with Amazon Athena.
D. Load the data to Amazon S3 and query it with Amazon Redshift Spectrum.

Show Suggested Answer

Suggested Answer: A 🗳️

by testtaker3434 at Aug. 9, 2020, 2:12 p.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

Paitan

Highly Voted 3 years, 8 months ago

Since we are talking about compressed file, Amazon Glacier Select cannot be used. So we need to transfer data to S3 and then use S3 select. So option A is the right choice.

upvoted 39 times

Merrick

2 years, 5 months ago

https://docs.aws.amazon.com/ko_kr/AmazonS3/latest/userguide/selecting-content-from-objects.html

upvoted 3 times

...

[Removed]

3 years, 6 months ago

Archive objects that are queried by S3 Glacier Select must be formatted as uncompressed comma-separated values (CSV).

upvoted 6 times

iris22

3 years, 3 months ago

https://docs.aws.amazon.com/amazonglacier/latest/dev/glacier-select.html

upvoted 2 times

...

zeronine

Highly Voted 3 years, 8 months ago

my answer is A. (You may need to use Athena if data is in multiple files but this question - data is in a single compressed file)

upvoted 18 times

Marvel_jarvis

3 years, 6 months ago

Athena cant query Glacier data, so A cant be right.

upvoted 2 times

cnmc

3 years, 5 months ago

A doesn't mention Athena....

upvoted 3 times

strikeEagle

3 years, 4 months ago

please read "The file is hosted in Amazon S3 Glacier..."

upvoted 1 times

...

awssp12345

3 years, 8 months ago

Yes! I agree. Thank you.

upvoted 2 times

...

NarenKA

Most Recent 1 year, 4 months ago

Selected Answer: B

Glacier Select allows to run queries directly on data stored in S3 Glacier without needing to restore and move the data to an active storage class in S3. This feature is designed for scenarios exactly like this, where you need to retrieve only a small subset of data from a large archive stored in Glacier and it is more cost-effective. You are charged for the queries you run and the data retrieved, which, for a small subset of a 100 MB file, could be minimal. This avoids the costs associated with moving the data to Amazon S3 and storing it there for querying. A - restoring the file to S3 and then querying it with S3 Select incurs extra costs and processing time. C and D - Athena or Redshift Spectrum are powerful for analyzing large datasets, they introduce unnecessary complexity and costs for the given task considering the relatively small size of the dataset.

upvoted 1 times

...

teo2157

1 year, 6 months ago

Selected Answer: B

It's B, you can use Amazon Glacier Select to query archived data in Amazon Glacier. https://aws.amazon.com/about-aws/whats-new/2017/11/amazon-glacier-select-makes-big-data-analytics-of-archive-data-possible/?nc1=h_ls

upvoted 2 times

GCPereira

1 year, 5 months ago

accepted, this is an old question that does not reflect current standards of the

upvoted 1 times

...

chinmayj213

1 year, 9 months ago

It is tricky question as now a day "Glacier-Select" support compressed g-zip csv. but the problem is file loaded a month ago and deep archival has limitation of one month to retrieve else we need to pay expedited retrieval fees

upvoted 1 times

...

chinmayj213

1 year, 9 months ago

https://github.com/awsdocs/amazon-glacier-developer-guide/blob/master/doc_source/glacier-select.md

upvoted 1 times

...

confuzz

1 year, 10 months ago

Looks like not up-to-date question. Links to AWS Docs posted in this discussion before are redirected now and don't mention Glacier Select (I don't count Blog and unofficial resources). It's only S3 Select now and it has no limit for Amazon S3 Glacier Instant Retrieval storage class. https://docs.aws.amazon.com/AmazonS3/latest/userguide/selecting-content-from-objects.html There is no just "S3 Glacier" now. This limitation "The archived objects that are being queried by the select request must be formatted as uncompressed comma-separated values (CSV) files" is googled now only in Restore Object command in SDK and CLI which I guess is used to restore from Glacier. If come back to the past, I would go with A, since I believe guys saw this limitation for Glacier Select to query from uncompressed files in the links before.

upvoted 5 times

confuzz

1 year, 10 months ago

Amazon S3 objects that are stored in the S3 Glacier Flexible Retrieval or S3 Glacier Deep Archive storage classes are not immediately accessible. To access an object in these storage classes, you must restore a temporary copy of the object to its S3 bucket for a specified duration (number of days). https://docs.aws.amazon.com/AmazonS3/latest/userguide/restoring-objects.html

upvoted 2 times

...

Parthasarathi

1 year, 11 months ago

Arka_01

2 years, 9 months ago

Selected Answer: A

Both Athena and Redshift are viable but way more costly than the S3 select option. Glacier Select cannot query on zipped data.

upvoted 1 times

...

Load full discussion...