exam questions

Exam AWS Certified Data Analytics - Specialty All Questions

View all questions & answers for the AWS Certified Data Analytics - Specialty exam

Exam AWS Certified Data Analytics - Specialty topic 1 question 26 discussion

An ecommerce company stores customer purchase data in Amazon RDS. The company wants a solution to store and analyze historical data. The most recent 6 months of data will be queried frequently for analytics workloads. This data is several terabytes large. Once a month, historical data for the last 5 years must be accessible and will be joined with the more recent data. The company wants to optimize performance and cost.
Which storage solution will meet these requirements?

  • A. Create a read replica of the RDS database to store the most recent 6 months of data. Copy the historical data into Amazon S3. Create an AWS Glue Data Catalog of the data in Amazon S3 and Amazon RDS. Run historical queries using Amazon Athena.
  • B. Use an ETL tool to incrementally load the most recent 6 months of data into an Amazon Redshift cluster. Run more frequent queries against this cluster. Create a read replica of the RDS database to run queries on the historical data.
  • C. Incrementally copy data from Amazon RDS to Amazon S3. Create an AWS Glue Data Catalog of the data in Amazon S3. Use Amazon Athena to query the data.
  • D. Incrementally copy data from Amazon RDS to Amazon S3. Load and store the most recent 6 months of data in Amazon Redshift. Configure an Amazon Redshift Spectrum table to connect to all historical data.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
abhineet
Highly Voted 3 years, 7 months ago
D seems correct
upvoted 20 times
...
Shraddha
Highly Voted 3 years, 6 months ago
Ans D Note: A and B are immediately out because RDS is not for analysis. C and D both work, but D is balanced between performance and cost. C may cost less (depending on data compression, frequency of queries) but query to recent data will be slower.
upvoted 17 times
Donell
3 years, 6 months ago
Correct, answer is D. Redshift is suitable for running complex analytical queries. Athena is suitable for small ad-hoc queries.
upvoted 6 times
...
...
GCPereira
Most Recent 1 year, 4 months ago
when we say "analytics queries", "analytics workloads" or "complex analysis", in 80% of the cases we call redshift... if we sum a short period of analysis (6 months in this case) redshift is a better option... rds will continue as a relational database, don't run analytics queries
upvoted 1 times
...
pk349
2 years ago
D: I passed the test
upvoted 2 times
Espa
1 year, 12 months ago
I see you have been posting only I passed the test :)
upvoted 5 times
DipeshGandhi131
1 year, 10 months ago
hahah!!
upvoted 2 times
...
...
...
AwsNewPeople
2 years, 2 months ago
Selected Answer: D
Option D is the most suitable solution for this scenario. Explanation: Incrementally copy data from Amazon RDS to Amazon S3: This allows for storing of historical data in a cost-effective manner while allowing frequent querying of the more recent data in RDS. Load and store the most recent 6 months of data in Amazon Redshift: This provides a performant solution for frequent queries of the most recent data. Configure an Amazon Redshift Spectrum table to connect to all historical data: This enables joining of historical data with the more recent data in Redshift, providing the required analysis capability. Option A does not address the requirement to optimize performance for querying the most recent data. Option B involves creating a read replica of RDS, which may not be efficient for frequently queried data. Option C also does not provide a solution for frequent querying of the most recent data.
upvoted 2 times
...
rags1482
2 years, 2 months ago
Option D suggests copying data from RDS to S3 incrementally, storing the most recent 6 months of data in Amazon Redshift, and configuring an Amazon Redshift Spectrum table to connect to all historical data. This approach allows the company to optimize cost and performance as Redshift is a cost-effective data warehousing solution that can handle large volumes of data. Additionally, using Redshift Spectrum enables the company to query both the recent and historical data sets together in real-time. Option A suggests creating a read replica of the RDS database to store the most recent 6 months of data and copying the historical data into Amazon S3. This approach does not allow for real-time querying of the historical data and may result in increased query latency.
upvoted 1 times
...
murali12180
2 years, 3 months ago
Selected Answer: A
A. By moving the data to S3 and Glue Catalog that carries both RDS and S3 schema will enable them to use the same schema for queries. Remember the requirement says "low cost". Redshift is out of the picture.
upvoted 1 times
aws_kid
2 years, 2 months ago
I don't think read replicas for certain months can be created. Read replicas will replicate entire db. Unlikely A is the answer
upvoted 1 times
...
...
BtotheJ
2 years, 3 months ago
Selected Answer: D
D for the win
upvoted 1 times
...
cloudlearnerhere
2 years, 6 months ago
D is the right answer as loading and querying recent 6 months of data via Redshift gives better performance and old data can be queried via Redshift spectrum C is wrong though it's possible to query the entire data in S3 using Athena, however, it will not be able to match the high performance offered by Redshift to query the last six months of data. So this option is not the best fit for the given use case. Options A & B are wrong as RDS is not an ideal solution to store and query historical data. Also, 6 months data may be several terabytes large.
upvoted 3 times
...
aefuen1
2 years, 6 months ago
Selected Answer: D
D seems correct
upvoted 1 times
...
rocky48
2 years, 10 months ago
Selected Answer: D
Answer-D
upvoted 1 times
...
jrheen
3 years ago
Answer-D
upvoted 1 times
...
simonaque
3 years, 1 month ago
Selected Answer: D
D seems correct
upvoted 1 times
...
ShilaP
3 years, 1 month ago
D is correct...
upvoted 1 times
...
Agn3001
3 years, 2 months ago
Selected Answer: D
effective way to query across S3 and RDS is using redshift spectrum
upvoted 1 times
...
umatrilok
3 years, 4 months ago
Historical Data points to Redshift Spectrum. Hence D
upvoted 1 times
...
aws2019
3 years, 5 months ago
answer is D.
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago