Exam AWS Certified Data Analytics - Specialty All Questions

View all questions & answers for the AWS Certified Data Analytics - Specialty exam

Exam AWS Certified Data Analytics - Specialty topic 1 question 26 discussion

Exam question from Amazon's AWS Certified Data Analytics - Specialty

Question #: 26
Topic #: 1

[All AWS Certified Data Analytics - Specialty Questions]

An ecommerce company stores customer purchase data in Amazon RDS. The company wants a solution to store and analyze historical data. The most recent 6 months of data will be queried frequently for analytics workloads. This data is several terabytes large. Once a month, historical data for the last 5 years must be accessible and will be joined with the more recent data. The company wants to optimize performance and cost.
Which storage solution will meet these requirements?

A. Create a read replica of the RDS database to store the most recent 6 months of data. Copy the historical data into Amazon S3. Create an AWS Glue Data Catalog of the data in Amazon S3 and Amazon RDS. Run historical queries using Amazon Athena.
B. Use an ETL tool to incrementally load the most recent 6 months of data into an Amazon Redshift cluster. Run more frequent queries against this cluster. Create a read replica of the RDS database to run queries on the historical data.
C. Incrementally copy data from Amazon RDS to Amazon S3. Create an AWS Glue Data Catalog of the data in Amazon S3. Use Amazon Athena to query the data.
D. Incrementally copy data from Amazon RDS to Amazon S3. Load and store the most recent 6 months of data in Amazon Redshift. Configure an Amazon Redshift Spectrum table to connect to all historical data.

Show Suggested Answer

Suggested Answer: D 🗳️

by Saaho at Aug. 20, 2020, 5:31 p.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

abhineet

Highly Voted 3 years, 7 months ago

D seems correct

upvoted 20 times

...

Shraddha

Highly Voted 3 years, 6 months ago

Ans D Note: A and B are immediately out because RDS is not for analysis. C and D both work, but D is balanced between performance and cost. C may cost less (depending on data compression, frequency of queries) but query to recent data will be slower.

upvoted 17 times

Donell

3 years, 6 months ago

Correct, answer is D. Redshift is suitable for running complex analytical queries. Athena is suitable for small ad-hoc queries.

upvoted 6 times

...

GCPereira

Most Recent 1 year, 4 months ago

when we say "analytics queries", "analytics workloads" or "complex analysis", in 80% of the cases we call redshift... if we sum a short period of analysis (6 months in this case) redshift is a better option... rds will continue as a relational database, don't run analytics queries

upvoted 1 times

...

pk349

2 years ago

D: I passed the test

upvoted 2 times

Espa

1 year, 12 months ago

I see you have been posting only I passed the test :)

upvoted 5 times

DipeshGandhi131

1 year, 10 months ago

hahah!!

upvoted 2 times

...

AwsNewPeople

2 years, 2 months ago

Selected Answer: D

Option D is the most suitable solution for this scenario. Explanation: Incrementally copy data from Amazon RDS to Amazon S3: This allows for storing of historical data in a cost-effective manner while allowing frequent querying of the more recent data in RDS. Load and store the most recent 6 months of data in Amazon Redshift: This provides a performant solution for frequent queries of the most recent data. Configure an Amazon Redshift Spectrum table to connect to all historical data: This enables joining of historical data with the more recent data in Redshift, providing the required analysis capability. Option A does not address the requirement to optimize performance for querying the most recent data. Option B involves creating a read replica of RDS, which may not be efficient for frequently queried data. Option C also does not provide a solution for frequent querying of the most recent data.

upvoted 2 times

...

rags1482

2 years, 2 months ago

Option D suggests copying data from RDS to S3 incrementally, storing the most recent 6 months of data in Amazon Redshift, and configuring an Amazon Redshift Spectrum table to connect to all historical data. This approach allows the company to optimize cost and performance as Redshift is a cost-effective data warehousing solution that can handle large volumes of data. Additionally, using Redshift Spectrum enables the company to query both the recent and historical data sets together in real-time. Option A suggests creating a read replica of the RDS database to store the most recent 6 months of data and copying the historical data into Amazon S3. This approach does not allow for real-time querying of the historical data and may result in increased query latency.

upvoted 1 times

...

murali12180

2 years, 3 months ago

Selected Answer: A

A. By moving the data to S3 and Glue Catalog that carries both RDS and S3 schema will enable them to use the same schema for queries. Remember the requirement says "low cost". Redshift is out of the picture.

upvoted 1 times

aws_kid

2 years, 2 months ago

I don't think read replicas for certain months can be created. Read replicas will replicate entire db. Unlikely A is the answer

upvoted 1 times

...

BtotheJ

2 years, 3 months ago

Selected Answer: D

D for the win

upvoted 1 times

...

cloudlearnerhere

2 years, 6 months ago

D is the right answer as loading and querying recent 6 months of data via Redshift gives better performance and old data can be queried via Redshift spectrum C is wrong though it's possible to query the entire data in S3 using Athena, however, it will not be able to match the high performance offered by Redshift to query the last six months of data. So this option is not the best fit for the given use case. Options A & B are wrong as RDS is not an ideal solution to store and query historical data. Also, 6 months data may be several terabytes large.

upvoted 3 times

...

aefuen1

2 years, 6 months ago

Selected Answer: D

D seems correct

upvoted 1 times

...

rocky48

2 years, 10 months ago

Selected Answer: D

Answer-D

upvoted 1 times

...

jrheen

3 years ago

Answer-D

upvoted 1 times

...

simonaque

3 years, 1 month ago

Selected Answer: D

D seems correct

upvoted 1 times

...

ShilaP

3 years, 1 month ago

D is correct...

upvoted 1 times

...

Agn3001

3 years, 2 months ago

Selected Answer: D

effective way to query across S3 and RDS is using redshift spectrum

upvoted 1 times

...

umatrilok

3 years, 4 months ago

Historical Data points to Redshift Spectrum. Hence D

upvoted 1 times

...

aws2019

3 years, 5 months ago

answer is D.

upvoted 1 times

...

Load full discussion...

Exam AWS Certified Data Analytics - Specialty All Questions

View all questions & answers for the AWS Certified Data Analytics - Specialty exam

Exam AWS Certified Data Analytics - Specialty topic 1 question 26 discussion

Comments

abhineet

Shraddha

Donell

GCPereira

pk349

Espa

DipeshGandhi131

AwsNewPeople

rags1482

murali12180

aws_kid

BtotheJ

cloudlearnerhere

aefuen1

rocky48

jrheen

simonaque

ShilaP

Agn3001

umatrilok

aws2019

SY0-701