exam questions

Exam AWS Certified Solutions Architect - Professional All Questions

View all questions & answers for the AWS Certified Solutions Architect - Professional exam

Exam AWS Certified Solutions Architect - Professional topic 1 question 496 discussion

A company collects a steady stream of 10 million data records from 100,000 sources each day. These records are written to an Amazon RDS MySQL DB. A query must produce the daily average of a data source over the past 30 days. There are twice as many reads as writes. Queries to the collected data are for one source
ID at a time.
How can the Solutions Architect improve the reliability and cost effectiveness of this solution?

  • A. Use Amazon Aurora with MySQL in a Multi-AZ mode. Use four additional read replicas.
  • B. Use Amazon DynamoDB with the source ID as the partition key and the timestamp as the sort key. Use a Time to Live (TTL) to delete data after 30 days.
  • C. Use Amazon DynamoDB with the source ID as the partition key. Use a different table each day.
  • D. Ingest data into Amazon Kinesis using a retention period of 30 days. Use AWS Lambda to write data records to Amazon ElastiCache for read access.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Moon
Highly Voted 3 years, 9 months ago
I would go with "B". A: would be preferred if no replicas!, because four replicas will make it costly solution. B: TTL with DynamoDB will solve the database size cost. make it cost effective. C: is not good solution. D: Kinesis can store date unto 7 days.
upvoted 39 times
Moon
3 years, 8 months ago
Also, A: does not mention any deletion of the old data! which will be much much costly!
upvoted 4 times
G3
3 years, 8 months ago
Deletion of data is not a requirement as per the question. It only says the query needs to produce results using the last 30 days. there are twice reads as much as writes. I would go for B.
upvoted 3 times
...
...
Firststack
3 years, 8 months ago
Amazon Kinesis Data Streams supports changes to the data record retention period of your data stream. A Kinesis data stream is an ordered sequence of data records meant to be written to and read from in real time. Data records are therefore stored in shards in your stream temporarily. The time period from when a record is added to when it is no longer accessible is called the retention period. A Kinesis data stream stores records from 24 hours by default, up to 8760 hours (365 days). https://docs.aws.amazon.com/streams/latest/dev/kinesis-extended-retention.html
upvoted 1 times
...
...
donathon
Highly Voted 3 years, 9 months ago
A A: Although Aurora is more expensive, it does improve the reliability. B\C: DynamoDB is a NOSQL so I don’t think it’s suitable for this case.
upvoted 15 times
Mobidic
3 years, 8 months ago
The use case is THE classic use case for an NOSQL -> DynamoDB. I saw that before seeing the answers... You can tell that there is no 'relation' involved.
upvoted 2 times
...
Byrney
2 years, 7 months ago
How does Aurora being more expensive "improve the cost-effectiveness" of the solution?
upvoted 3 times
...
AWSPro24
3 years, 8 months ago
For everyone supporting B, so are we just going to re-architect the database to go from SQL to NoSQL with no mention in the question of if that is acceptable. Seems like a huge inference to me.
upvoted 7 times
qianhaopower
3 years, 8 months ago
No need to re architect, the Dynamo DB can be a staging DB just supporting then query
upvoted 1 times
...
ipindado2020
3 years, 8 months ago
Considering the use case... in fact it is a simple data store... no complex queries... And for sure A is very very expensive.... Then B
upvoted 4 times
...
tobstar86
3 years, 3 months ago
NoSQL = Not Only SQL, it's fine to use
upvoted 1 times
...
...
...
michaelvanzak
Most Recent 1 week, 4 days ago
Selected Answer: B
Best Answer: B. Use Amazon DynamoDB with the source ID as the partition key and the timestamp as the sort key. Use a Time to Live (TTL) to delete data after 30 days. Why this is optimal: DynamoDB supports massive scale with predictable performance and read-heavy workloads. The schema with source ID as partition key and timestamp as sort key enables efficient querying of one source over time (ideal for "30-day average" lookups). TTL (Time to Live) automatically deletes old records, reducing storage cost and management overhead. DynamoDB is serverless, highly available, and resilient by design.
upvoted 1 times
...
shammous
1 year, 5 months ago
Selected Answer: B
This is a typical use case (IOT). If we consider massive amounts of data, DynamoDB is the winner. In our case, we need a quick read/write data transaction system that DynamoDb can handle (Optimized for read/write). There is no mention of staying with an SQL database and converting to NoSQL is feasible. the suggested Aurora answer doesn't handle the fact that after 30 days, data can be discarded. So after a couple of months, the database will keep growing and incurring more and more costs! Which would be too expensive. I found a nice article comparing both dbs: https://dynobase.dev/dynamodb-vs-aurora/
upvoted 1 times
...
SkyZeroZx
2 years ago
Selected Answer: A
A is the correct answer although DynamoDB could work it implies rewriting the application and can bring more time and expense which is not the context of the question. Then A more seems
upvoted 1 times
vn_thanhtung
1 year, 9 months ago
https://aws.amazon.com/blogs/aws/new-create-an-amazon-aurora-read-replica-from-a-mysql-db-instance/ I think 4 read replicas => more cost
upvoted 1 times
...
...
dev112233xx
2 years, 2 months ago
Selected Answer: A
A is the correct answer I don't think DynamoDB will be cheaper in this case... just tried to calculate the cost of DynamoDB for such HUGE data 10mx30days = 300m write records per month and 600m read records per month (twice) storage will be about 30tb> so price will be $4k> Aurora will be cheaper for sure!! even with additional 4RR
upvoted 1 times
...
hobokabobo
2 years, 6 months ago
Selected Answer: B
A: looks possible and real live I would rather take that approach as in reality different Database means rewriting software .... but this reality is out of scope of this question and Aurora is pricy B: question explicitely says "Queries to the collected data are for one source ID at a time.". Yes we have no relations, we do not need a relational database and so we can go with dynamo db. Setting the TTL gives us the required retention. This is way cheaper than aurora. C) Again DynamoDB. It does not mention the TTL. D) Kinesis might be posible but whats the cache for. Odd. A reality sidemark: once an average over a day(daily average) generated it doesn't have to recalculated as it doesn't change. Even if you want average over 30 days you only need at most the total *or* avarage of aday and the number of items so processing so many records over and over again ... if I'd rewrite that to a rolling update would be cost effective. For the exam I would go with B
upvoted 1 times
...
JohnPi
2 years, 8 months ago
Selected Answer: B
B. Use Amazon DynamoDB with the source ID as the partition key and the timestamp as the sort key. Use a Time to Live (TTL) to delete data after 30 days.
upvoted 1 times
...
Dionenonly
2 years, 8 months ago
Selected Answer: A
A is the answer. Dynamo DB is for NO SQL.
upvoted 1 times
...
Kinty1982
2 years, 11 months ago
I would go with "A": - Aurora supports MySQL so there is no need for architecture change - MySQL have aggregation functions where DynamoDB does not (only client-side, so we have to scan few TB and fetch them - that would be a massive cost) https://stackoverflow.com/questions/26298829/does-dynamodb-support-aggregate-functions-like-avg-max-min - Storing TBs of data would be cheaper in Aurora than in DynamoDB
upvoted 2 times
...
jyrajan69
3 years, 3 months ago
So many answers for B with no justification for switching from SQL to NO-Sql?? Its a simple problem, already running MySql, so to improve the system add Read Replicas, because more reads than writes. No mention of having to delete data as a requirement, so unless someone can tell me why, my answer will have to be A.
upvoted 2 times
...
cldy
3 years, 6 months ago
B. Use Amazon DynamoDB with the source ID as the partition key and the timestamp as the sort key. Use a Time to Live (TTL) to delete data after 30 days.
upvoted 1 times
...
AzureDP900
3 years, 6 months ago
I will pick B
upvoted 2 times
...
TiredDad
3 years, 7 months ago
10 million records * 100,000 sources * 100 bytes (assuming each record of 100 bytes) = (10000000*100000*100)/(1024*1024*1024*1024) = 90.9 TB of data each day! Max size supported by Aurora MySQL is 128TB. It can not take 30 days of data in one Aurora instance!
upvoted 5 times
TiredDad
3 years, 7 months ago
Same reason can be applied for ElastiCache. Max size supported is 170.6TB, so can't hold 30 days of data. https://aws.amazon.com/about-aws/whats-new/2018/11/amazon-elasticache-for-redis-now-supports-up-to-250-nodes-per-cluster/
upvoted 2 times
TiredDad
3 years, 7 months ago
When we query data from DynamoDB table using batchgetitem (https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_BatchGetItem.html), A single operation can retrieve up to 16 MB of data, which can contain as many as 100 items. For example, if you ask to retrieve 100 items, but each individual item is 300 KB in size, the system returns 52 items (so as not to exceed the 16 MB limit). It also returns an appropriate UnprocessedKeys value so you can get the next page of results. If desired, your application can include its own logic to assemble the pages of results into one dataset. As such, when you have to write your application logic, you can query one table or 30 tables. With the 30 table approach, you can archive older tables or reduce WCU, RCU to make it cost effective.
upvoted 1 times
...
...
AjayPrajapati
2 years, 7 months ago
I dont think it is 10 million from each source, it is 10 million from combined of all 10kk sources. I still would go with Dynamo. Aurora is expensive and it allready gives 6 replica by default so 4 read replica does not make sense
upvoted 1 times
...
...
TiredDad
3 years, 7 months ago
10 million records * 100,000 sources * 100 bytes (assuming each record of 100 bytes) = (10000000*100000*100)/(1024*1024*1024*1024) = 90.9 TB of data each day! Max size supported by Aurora MySQL is 128TB. It can not take 30 days of data in one Aurora instance!
upvoted 1 times
...
DerekKey
3 years, 7 months ago
"reliability and cost effectiveness" In my opinion - don't mix up cost with the daily average. Cost is related to the total cost of running such queries and the cost of supporting infrastructure. B wrong - having one table to serve 300 million (and more) records will have a lot of constraints not to mention only one configuration of RCU/WCU that must support writing and reports C correct - as mentioned by other people
upvoted 1 times
...
WhyIronMan
3 years, 7 months ago
I'll go for B
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...