exam questions

Exam AWS Certified Big Data - Specialty All Questions

View all questions & answers for the AWS Certified Big Data - Specialty exam

Exam AWS Certified Big Data - Specialty topic 2 question 16 discussion

Exam question from Amazon's AWS Certified Big Data - Specialty
Question #: 16
Topic #: 2
[All AWS Certified Big Data - Specialty Questions]

A real-time bidding company is rebuilding their monolithic application and is focusing on serving real-time data. A large number of reads and writes are generated from thousands of concurrent users who follow items and bid on the company's sale offers.
The company is experiencing high latency during special event spikes, with millions of concurrent users.
The company needs to analyze and aggregate a part of the data in near real time to feed an internal dashboard.
What is the BEST approach for serving and analyzing data, considering the constraint of the row latency on the highly demanded data?

  • A. Use Amazon Aurora with Multi Availability Zone and read replicas. Use Amazon ElastiCache in front of the read replicas to serve read-only content quickly. Use the same database as datasource for the dashboard.
  • B. Use Amazon DynamoDB to store real-time data with Amazon DynamoDB. Accelerator to serve content quickly. use Amazon DynamoDB Streams to replay all changes to the table, process and stream to Amazon Elasti search Service with AWS Lambda.
  • C. Use Amazon RDS with Multi Availability Zone. Provisioned IOPS EBS volume for storage. Enable up to five read replicas to serve read-only content quickly. Use Amazon EMR with Sqoop to import Amazon RDS data into HDFS for analysis.
  • D. Use Amazon Redshift with a DC2 node type and a multi-mode cluster. Create an Amazon EC2 instance with pgpoo1 installed. Create an Amazon ElastiCache cluster and route read requests through pgpoo1, and use Amazon Redshift for analysis. D
Show Suggested Answer Hide Answer
Suggested Answer: Explanation 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
DerekKey
3 years, 7 months ago
A wrong -> Use Amazon ElastiCache in front of the read replicas? C wrong -> why five read replicas? D wrong -> 1. Pgpool can run in an Amazon EC2 instance for dev and test and a fleet of EC2 instances with Elastic Load Balancing and Auto Scaling in production however, we strongly recommend that you test pgpool with your PostgreSQL client before making any changes to your architecture. 2. The company needs to analyze and aggregate a part of the data in near real time to feed an internal dashboard. - for Redshift it will not be near real time - For Amazon Redshift destination, Amazon Kinesis Data Firehose delivers data to your Amazon S3 bucket first and then issues Redshift COPY command to load data from your S3 bucket to your Redshift cluster. The frequency of data delivery to Amazon S3 - buffer interval 60 to 900 seconds B should be correct
upvoted 1 times
...
vicks316
3 years, 7 months ago
B for sure
upvoted 1 times
...
Venky_2020
3 years, 7 months ago
B looks to be best solution
upvoted 1 times
...
k115
3 years, 7 months ago
C is the correct answer
upvoted 1 times
...
YashBindlish
3 years, 7 months ago
Correct Answer is B
upvoted 1 times
...
san2020
3 years, 7 months ago
my selection B
upvoted 1 times
...
ME2000
3 years, 7 months ago
“The question is the answer.” ― Thomas Vato, Questology What is the BEST approach for serving and analyzing data, considering the constraint of the low latency on the highly demanded data? Firstly, Question asking for analyzing data (The company needs to analyze and aggregate a part of the data in near real time to feed an internal dashboard.), So all about OLAP Secondly, low latency on the highly demanded data (The company is experiencing high latency during special event spikes, with millions of concurrent users.)- Sort of Caching solution Finally, D is the correct answer
upvoted 1 times
AdamSmith
3 years, 7 months ago
agregate and analyze data in real-time with Redshift? millions of users and transactions with Redshift? nice quote, doesn't work
upvoted 4 times
DerekKey
3 years, 7 months ago
AdamSmith - you are 100% right - Redshift is not intended for such workload
upvoted 1 times
...
...
...
stevenchenau
3 years, 7 months ago
Support B. Real-time bidding is a perfect use case of DynamoDB accelerator.
upvoted 2 times
...
mattyb123
3 years, 8 months ago
D's similar setup explained here https://aws.amazon.com/blogs/big-data/using-pgpool-and-amazon-elasticache-for-query-caching-with-amazon-redshift/. Just keen to hear other thoughts selected this option last time and didnt score well in storage section. I thought best practice was to use Redshift for OLAP instead of OLTP. Keen to get everyone else's thoughts?
upvoted 1 times
mattyb123
3 years, 8 months ago
Could be as simple as B due to DynamoDB
upvoted 2 times
mattyb123
3 years, 8 months ago
Since the question refers to monolithic i am now assuming it is referring to architecture design tiers using Relational Databases.This link https://blog.acolyer.org/2019/03/25/amazon-aurora-design-considerations-for-high-throughput-cloud-native-relational-databases/ discusses the improvements on using Aurora.
upvoted 1 times
jlpl
3 years, 8 months ago
A? selected?
upvoted 1 times
mattyb123
3 years, 8 months ago
What are your thoughts on D again? AWS blog post shows how it can be done and AWS want users to view and consult the AWS blog posts to assist in creating solutions. As the question asks 'near real time' and 'BEST approach for serving and analyzing data' redshift can do SQL queries the fastest and with the caching engine attached with pgpool this can be done.
upvoted 1 times
...
...
...
mattyb123
3 years, 8 months ago
I dont think Amazon Elastisearch Service is the right use case here.
upvoted 1 times
ranabhay
3 years, 7 months ago
millions of concurrent users, low latency => DynamoDB and DAX seems correct. real time dash board => DynamoDB streams helps also elasticsearch can do aggregation.
upvoted 10 times
mattyb123
3 years, 7 months ago
Thanks for the correction @ranabhay
upvoted 1 times
...
...
mattyb123
3 years, 7 months ago
Apologies. Have done more research on this question. I think the answer is B not D. The reason for D being incorrect is AWS big data blog post mentions only having 6 to 10 users for the use case. B for the following reasons DynamoDB scales well for the millions of users, DynamoDB streams can be used to aggregate data, Lambda function to push to Elasticsearch. Elasticsearch can be used for application monitoring and Analyzing product usage data. Has Kibana visualisations. Lastly question doesn't mention anything about querying 1.https://rockset.com/blog/live-dashboards-dynamodb-streams-lambda-elasticache/ 2.https://aws.amazon.com/blogs/compute/indexing-amazon-dynamodb-content-with-amazon-elasticsearch-service-using-aws-lambda/ 3. https://aws.amazon.com/blogs/startups/combining-dynamodb-and-amazon-elasticsearch-with-lambda/ 4.https://d1.awsstatic.com/whitepapers/Big_Data_Analytics_Options_on_AWS.pdf?did=wp_card&trk=wp_card
upvoted 23 times
BigEv
3 years, 7 months ago
Good catch, B +1
upvoted 6 times
...
...
...
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...