Exam AWS Certified Data Analytics - Specialty All Questions

View all questions & answers for the AWS Certified Data Analytics - Specialty exam

Exam AWS Certified Data Analytics - Specialty topic 1 question 125 discussion

Exam question from Amazon's AWS Certified Data Analytics - Specialty

Question #: 125
Topic #: 1

[All AWS Certified Data Analytics - Specialty Questions]

An education provider's learning management system (LMS) is hosted in a 100 TB data lake that is built on Amazon S3. The provider's LMS supports hundreds of schools. The provider wants to build an advanced analytics reporting platform using Amazon Redshift to handle complex queries with optimal performance.
System users will query the most recent 4 months of data 95% of the time while 5% of the queries will leverage data from the previous 12 months.
Which solution meets these requirements in the MOST cost-effective way?

A. Store the most recent 4 months of data in the Amazon Redshift cluster. Use Amazon Redshift Spectrum to query data in the data lake. Use S3 lifecycle management rules to store data from the previous 12 months in Amazon S3 Glacier storage.
B. Leverage DS2 nodes for the Amazon Redshift cluster. Migrate all data from Amazon S3 to Amazon Redshift. Decommission the data lake.
C. Store the most recent 4 months of data in the Amazon Redshift cluster. Use Amazon Redshift Spectrum to query data in the data lake. Ensure the S3 Standard storage class is in use with objects in the data lake.
D. Store the most recent 4 months of data in the Amazon Redshift cluster. Use Amazon Redshift federated queries to join cluster data with the data lake to reduce costs. Ensure the S3 Standard storage class is in use with objects in the data lake.

Show Suggested Answer

Suggested Answer: C 🗳️

by srinivasa at Oct. 25, 2021, 5:11 a.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

srinivasa

Highly Voted 3 years, 9 months ago

Answer: C

upvoted 16 times

...

cloudlearnerhere

Highly Voted 2 years, 8 months ago

Correct answer is C as only the last 4 months of data is required for 95%, the data can be stored in the Redshift cluster. The data covering the 12 months can be moved to S3 and queried using Redshift Spectrum. A mix of S3 and Redshift would provide the most cost-effective option. https://docs.aws.amazon.com/redshift/latest/dg/c-using-spectrum.html Option A is wrong as Redshift Spectrum cannot be used to query S3 Glacier storage data. Option B is wrong as using Redshift for all the data is not cost-effective. Option D is wrong as although Redshift federated queries would work, however, for 5% it would be cost-effective to query S3 directly instead of joining data from cluster and S3.

upvoted 9 times

...

chinmayj213

Most Recent 1 year, 4 months ago

Point A cannot be answer because 5% user/query will be frequently used and Glacier need 90 to 180 days period. Point D : Federated query make sense when multiple data source because it required lots of authentication and authorization process , But here we have only S3 , so we can go with Point C

upvoted 2 times

...

roymunson

1 year, 7 months ago

I don't get it: The question says: "System users will query the most recent 4 months of data 95% of the time while 5% of the queries will leverage data from the previous 12 months." 95% using data from the previous 4 month 5 % using data from the previous 12 month 0 % using data that is older than 12 month. So why not archive them?

upvoted 1 times

roymunson

1 year, 7 months ago

Ahh now I got it: They want tu use the glacier to store the data from the previous 12 months.

upvoted 1 times

...

pk349

2 years, 2 months ago

C: I passed the test

upvoted 1 times

...

Arjun777

2 years, 4 months ago

Option C suggests storing the most recent 4 months of data in the Amazon Redshift cluster and using Amazon Redshift Spectrum to query data in the data lake, while ensuring that the S3 Standard storage class is in use with objects in the data lake. While this approach could work, it may not be the most cost-effective way to meet the requirements, because storing data in the Amazon Redshift cluster can be more expensive than storing data in S3. Additionally, by storing all data in the data lake, you may be able to use other data analysis services to query the data, which can be more cost-effective than using Amazon Redshift. Therefore, Option A, which leverages Amazon Redshift Spectrum to query the data in the data lake and uses S3 lifecycle management rules to move data from the previous 12 months to Amazon S3 Glacier, is likely a more cost-effective solution.

upvoted 4 times

...

rocky48

2 years, 11 months ago

Selected Answer: C

upvoted 1 times

...

Alekx42

2 years, 12 months ago

Selected Answer: C

C is the answer. If you have to join old data coming from S3 with new data coming from the Redshift cluster, you can do that. It is described here: https://catalog.us-east-1.prod.workshops.aws/workshops/e5548031-3004-49ad-89be-a13e8cd616f6/en-US/perform-analytics-on-your-data/join-and-query-data-with-redshift-spectrum

upvoted 1 times

...

GiveMeEz

3 years ago

Not D. https://docs.aws.amazon.com/redshift/latest/dg/federated-limitations.html

upvoted 1 times

...

Bik000

3 years, 1 month ago

Selected Answer: C

Answer C should be correct

upvoted 1 times

...

certificationJunkie

3 years, 1 month ago

C. There is no mention of joining old and new data. Hence no need for federated queries.

upvoted 1 times

certificationJunkie

3 years, 1 month ago

federated queries are for databases and not specific to s3. Here requirement is s3 so spectrum should work okay.

upvoted 1 times

...

Shammy45

3 years, 1 month ago

Selected Answer: C

Its C , textbook case for spectrum

upvoted 1 times

...

MWL

3 years, 2 months ago

Selected Answer: D

I think D is correct. Using federated queries to combine redshift and Spectrum (from S3) will be more cost effective.

upvoted 1 times

...

yogen

3 years, 6 months ago

A - Data from past 12 months is not required so glacier is most cost effective, Redshift spectrum to be used for querying data 4 - 12 months older, and upto 4 months old data is to queried from Redshift DB

upvoted 2 times

yogen

3 years, 6 months ago

I correct myself for misreading data from last 12 months to be in glacier, answer is C

upvoted 2 times

...

damaldon

3 years, 7 months ago

The question says "efficiently handle complicated queries", A. recommend S3 lifecycle, but I don´t think Glacier be efficient, even Expedited retrieval (1-5) minutes. I will go for C.

upvoted 3 times

...

tobsam

3 years, 7 months ago

ali98 on point. Answer is C

upvoted 1 times

...

awsmani

3 years, 7 months ago

Isn't D? The 5% of the user should be able to query both. moving the 4 months data to redshift makes sense and at the same time for 12 months data you need to query Redshift + Data lake data so in that federated queries can help to do that. In option C if they have provided Redshift Spectrum queries S3 as an external also then it make sense to choose C, But I do not read that way.

upvoted 4 times

...

Load full discussion...