exam questions

Exam AWS Certified Data Analytics - Specialty All Questions

View all questions & answers for the AWS Certified Data Analytics - Specialty exam

Exam AWS Certified Data Analytics - Specialty topic 1 question 45 discussion

Once a month, a company receives a 100 MB .csv file compressed with gzip. The file contains 50,000 property listing records and is stored in Amazon S3 Glacier.
The company needs its data analyst to query a subset of the data for a specific vendor.
What is the most cost-effective solution?

  • A. Load the data into Amazon S3 and query it with Amazon S3 Select.
  • B. Query the data from Amazon S3 Glacier directly with Amazon Glacier Select.
  • C. Load the data to Amazon S3 and query it with Amazon Athena.
  • D. Load the data to Amazon S3 and query it with Amazon Redshift Spectrum.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Paitan
Highly Voted 3 years, 7 months ago
Since we are talking about compressed file, Amazon Glacier Select cannot be used. So we need to transfer data to S3 and then use S3 select. So option A is the right choice.
upvoted 39 times
Merrick
2 years, 4 months ago
https://docs.aws.amazon.com/ko_kr/AmazonS3/latest/userguide/selecting-content-from-objects.html
upvoted 3 times
...
[Removed]
3 years, 4 months ago
Archive objects that are queried by S3 Glacier Select must be formatted as uncompressed comma-separated values (CSV).
upvoted 6 times
iris22
3 years, 1 month ago
https://docs.aws.amazon.com/amazonglacier/latest/dev/glacier-select.html
upvoted 2 times
...
...
...
zeronine
Highly Voted 3 years, 7 months ago
my answer is A. (You may need to use Athena if data is in multiple files but this question - data is in a single compressed file)
upvoted 18 times
Marvel_jarvis
3 years, 5 months ago
Athena cant query Glacier data, so A cant be right.
upvoted 2 times
cnmc
3 years, 3 months ago
A doesn't mention Athena....
upvoted 3 times
strikeEagle
3 years, 2 months ago
please read "The file is hosted in Amazon S3 Glacier..."
upvoted 1 times
...
...
...
awssp12345
3 years, 7 months ago
Yes! I agree. Thank you.
upvoted 2 times
...
...
NarenKA
Most Recent 1 year, 2 months ago
Selected Answer: B
Glacier Select allows to run queries directly on data stored in S3 Glacier without needing to restore and move the data to an active storage class in S3. This feature is designed for scenarios exactly like this, where you need to retrieve only a small subset of data from a large archive stored in Glacier and it is more cost-effective. You are charged for the queries you run and the data retrieved, which, for a small subset of a 100 MB file, could be minimal. This avoids the costs associated with moving the data to Amazon S3 and storing it there for querying. A - restoring the file to S3 and then querying it with S3 Select incurs extra costs and processing time. C and D - Athena or Redshift Spectrum are powerful for analyzing large datasets, they introduce unnecessary complexity and costs for the given task considering the relatively small size of the dataset.
upvoted 1 times
...
teo2157
1 year, 5 months ago
Selected Answer: B
It's B, you can use Amazon Glacier Select to query archived data in Amazon Glacier. https://aws.amazon.com/about-aws/whats-new/2017/11/amazon-glacier-select-makes-big-data-analytics-of-archive-data-possible/?nc1=h_ls
upvoted 2 times
GCPereira
1 year, 4 months ago
accepted, this is an old question that does not reflect current standards of the
upvoted 1 times
...
...
chinmayj213
1 year, 7 months ago
It is tricky question as now a day "Glacier-Select" support compressed g-zip csv. but the problem is file loaded a month ago and deep archival has limitation of one month to retrieve else we need to pay expedited retrieval fees
upvoted 1 times
...
chinmayj213
1 year, 7 months ago
https://github.com/awsdocs/amazon-glacier-developer-guide/blob/master/doc_source/glacier-select.md
upvoted 1 times
...
confuzz
1 year, 9 months ago
Looks like not up-to-date question. Links to AWS Docs posted in this discussion before are redirected now and don't mention Glacier Select (I don't count Blog and unofficial resources). It's only S3 Select now and it has no limit for Amazon S3 Glacier Instant Retrieval storage class. https://docs.aws.amazon.com/AmazonS3/latest/userguide/selecting-content-from-objects.html There is no just "S3 Glacier" now. This limitation "The archived objects that are being queried by the select request must be formatted as uncompressed comma-separated values (CSV) files" is googled now only in Restore Object command in SDK and CLI which I guess is used to restore from Glacier. If come back to the past, I would go with A, since I believe guys saw this limitation for Glacier Select to query from uncompressed files in the links before.
upvoted 5 times
confuzz
1 year, 9 months ago
Amazon S3 objects that are stored in the S3 Glacier Flexible Retrieval or S3 Glacier Deep Archive storage classes are not immediately accessible. To access an object in these storage classes, you must restore a temporary copy of the object to its S3 bucket for a specified duration (number of days). https://docs.aws.amazon.com/AmazonS3/latest/userguide/restoring-objects.html
upvoted 2 times
...
...
Parthasarathi
1 year, 10 months ago
Selected Answer: B
The Amazon S3 Glacier Select works on objects as it supports a subset of SQL with a format like CSV, JSON, or Apache Parquet format. Objects compressed with GZIP or BZIP2 (for CSV and JSON objects only), and server-side encrypted objects can also be retrieved. Ref : https://www.scaler.com/topics/aws/s3-glacier-select/
upvoted 7 times
...
pk349
2 years ago
A: I passed the test
upvoted 1 times
...
flanfranco
2 years ago
Option A: https://aws.amazon.com/blogs/aws/s3-glacier-select/
upvoted 1 times
...
anjuvinayan
2 years ago
I have searched a lot and couldn't find document stating glacier will not support compressed file. For me Glacier select is first choice and then s3 select considering cost.
upvoted 3 times
...
thirstylion
2 years, 1 month ago
Answer: B. Nowhere have I read that Glacier Select cannot query compressed CSV files.
upvoted 3 times
...
cloudlearnerhere
2 years, 6 months ago
Selected Answer: A
Correct answer is A as AWS S3 Select enables querying S3 data on selected fields. As S3 Glacier Select does not support uncompressed data, it needs to be restored to S3. With Amazon S3 Select, you can use simple structured query language (SQL) statements to filter the contents of an Amazon S3 object and retrieve just the subset of data that you need. By using Amazon S3 Select to filter this data, you can reduce the amount of data that Amazon S3 transfers, which reduces the cost and latency to retrieve this data. Amazon S3 Select works on objects stored in CSV, JSON, or Apache Parquet format. It also works with objects that are compressed with GZIP or BZIP2 (for CSV and JSON objects only), and server-side encrypted objects. You can specify the format of the results as either CSV or JSON, and you can determine how the records in the result are delimited. Option B is wrong as Archive objects that are queried by S3 Glacier Select must be formatted as uncompressed comma-separated values (CSV). Options C & D are wrong as Athena and Redshift would add additional cost.
upvoted 4 times
Arumugam_S
1 year, 6 months ago
https://www.scaler.com/topics/aws/s3-glacier-select/ but in this document they have mentioned s3 glacier select support compressed gzip format
upvoted 1 times
...
...
Rejju
2 years, 6 months ago
A and B seems to be wrong according to the below statement: Amazon S3 Select scan range requests support Parquet, CSV (without quoted delimiters), and JSON objects (in LINES mode only). CSV and JSON objects must be uncompressed. For line-based CSV and JSON objects, when a scan range is specified as part of the Amazon S3 Select request, all records that start within the scan range are processed. For Parquet objects, all of the row groups that start within the scan range requested are processed. D is costly and hence only feasible ans I see is C.
upvoted 1 times
...
Rejju
2 years, 6 months ago
Amazon S3 Select scan range requests support Parquet, CSV (without quoted delimiters), and JSON objects (in LINES mode only). CSV and JSON objects must be uncompressed. For line-based CSV and JSON objects, when a scan range is specified as part of the Amazon S3 Select request, all records that start within the scan range are processed. For Parquet objects, all of the row groups that start within the scan range requested are processed.
upvoted 1 times
...
Haimett
2 years, 6 months ago
Selected Answer: A
GZIP or BZIP2 - CSV and JSON files can be compressed using GZIP or BZIP2. GZIP and BZIP2 are the only compression formats that Amazon S3 Select supports for CSV and JSON files. Amazon S3 Select supports columnar compression for Parquet using GZIP or Snappy. Amazon S3 Select does not support whole-object compression for Parquet objects.
upvoted 1 times
...
Arka_01
2 years, 7 months ago
Selected Answer: A
Both Athena and Redshift are viable but way more costly than the S3 select option. Glacier Select cannot query on zipped data.
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago