Exam AWS Certified Big Data - Specialty All Questions

View all questions & answers for the AWS Certified Big Data - Specialty exam

Exam AWS Certified Big Data - Specialty topic 2 question 16 discussion

Exam question from Amazon's AWS Certified Big Data - Specialty

Question #: 16
Topic #: 2

[All AWS Certified Big Data - Specialty Questions]

A real-time bidding company is rebuilding their monolithic application and is focusing on serving real-time data. A large number of reads and writes are generated from thousands of concurrent users who follow items and bid on the company's sale offers.
The company is experiencing high latency during special event spikes, with millions of concurrent users.
The company needs to analyze and aggregate a part of the data in near real time to feed an internal dashboard.
What is the BEST approach for serving and analyzing data, considering the constraint of the row latency on the highly demanded data?

A. Use Amazon Aurora with Multi Availability Zone and read replicas. Use Amazon ElastiCache in front of the read replicas to serve read-only content quickly. Use the same database as datasource for the dashboard.
B. Use Amazon DynamoDB to store real-time data with Amazon DynamoDB. Accelerator to serve content quickly. use Amazon DynamoDB Streams to replay all changes to the table, process and stream to Amazon Elasti search Service with AWS Lambda.
C. Use Amazon RDS with Multi Availability Zone. Provisioned IOPS EBS volume for storage. Enable up to five read replicas to serve read-only content quickly. Use Amazon EMR with Sqoop to import Amazon RDS data into HDFS for analysis.
D. Use Amazon Redshift with a DC2 node type and a multi-mode cluster. Create an Amazon EC2 instance with pgpoo1 installed. Create an Amazon ElastiCache cluster and route read requests through pgpoo1, and use Amazon Redshift for analysis. D

Show Suggested Answer

Suggested Answer: Explanation 🗳️

by mattyb123 at Aug. 26, 2019, 6:21 a.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

DerekKey

3 years, 7 months ago

A wrong -> Use Amazon ElastiCache in front of the read replicas? C wrong -> why five read replicas? D wrong -> 1. Pgpool can run in an Amazon EC2 instance for dev and test and a fleet of EC2 instances with Elastic Load Balancing and Auto Scaling in production however, we strongly recommend that you test pgpool with your PostgreSQL client before making any changes to your architecture. 2. The company needs to analyze and aggregate a part of the data in near real time to feed an internal dashboard. - for Redshift it will not be near real time - For Amazon Redshift destination, Amazon Kinesis Data Firehose delivers data to your Amazon S3 bucket first and then issues Redshift COPY command to load data from your S3 bucket to your Redshift cluster. The frequency of data delivery to Amazon S3 - buffer interval 60 to 900 seconds B should be correct

upvoted 1 times

...

vicks316

3 years, 7 months ago

B for sure

upvoted 1 times

...

Venky_2020

3 years, 7 months ago

B looks to be best solution

upvoted 1 times

...

k115

3 years, 7 months ago

C is the correct answer

upvoted 1 times

...

YashBindlish

3 years, 7 months ago

Correct Answer is B

upvoted 1 times

...

san2020

3 years, 7 months ago

my selection B

upvoted 1 times

...

ME2000

3 years, 7 months ago

“The question is the answer.” ― Thomas Vato, Questology What is the BEST approach for serving and analyzing data, considering the constraint of the low latency on the highly demanded data? Firstly, Question asking for analyzing data (The company needs to analyze and aggregate a part of the data in near real time to feed an internal dashboard.), So all about OLAP Secondly, low latency on the highly demanded data (The company is experiencing high latency during special event spikes, with millions of concurrent users.)- Sort of Caching solution Finally, D is the correct answer

upvoted 1 times

AdamSmith

3 years, 7 months ago

agregate and analyze data in real-time with Redshift? millions of users and transactions with Redshift? nice quote, doesn't work

upvoted 4 times

DerekKey

3 years, 7 months ago

AdamSmith - you are 100% right - Redshift is not intended for such workload

upvoted 1 times

...

stevenchenau

3 years, 7 months ago

Support B. Real-time bidding is a perfect use case of DynamoDB accelerator.

upvoted 2 times

...

mattyb123

3 years, 8 months ago

D's similar setup explained here https://aws.amazon.com/blogs/big-data/using-pgpool-and-amazon-elasticache-for-query-caching-with-amazon-redshift/. Just keen to hear other thoughts selected this option last time and didnt score well in storage section. I thought best practice was to use Redshift for OLAP instead of OLTP. Keen to get everyone else's thoughts?

upvoted 1 times

mattyb123

3 years, 8 months ago

Could be as simple as B due to DynamoDB

upvoted 2 times

mattyb123

3 years, 8 months ago

Since the question refers to monolithic i am now assuming it is referring to architecture design tiers using Relational Databases.This link https://blog.acolyer.org/2019/03/25/amazon-aurora-design-considerations-for-high-throughput-cloud-native-relational-databases/ discusses the improvements on using Aurora.

upvoted 1 times

jlpl

3 years, 8 months ago

A? selected?

upvoted 1 times

mattyb123

3 years, 8 months ago

What are your thoughts on D again? AWS blog post shows how it can be done and AWS want users to view and consult the AWS blog posts to assist in creating solutions. As the question asks 'near real time' and 'BEST approach for serving and analyzing data' redshift can do SQL queries the fastest and with the caching engine attached with pgpool this can be done.

upvoted 1 times

...

mattyb123

3 years, 8 months ago

I dont think Amazon Elastisearch Service is the right use case here.

upvoted 1 times

ranabhay

3 years, 7 months ago

millions of concurrent users, low latency => DynamoDB and DAX seems correct. real time dash board => DynamoDB streams helps also elasticsearch can do aggregation.

upvoted 10 times

mattyb123

3 years, 7 months ago

Thanks for the correction @ranabhay

upvoted 1 times

...

mattyb123

3 years, 7 months ago

Apologies. Have done more research on this question. I think the answer is B not D. The reason for D being incorrect is AWS big data blog post mentions only having 6 to 10 users for the use case. B for the following reasons DynamoDB scales well for the millions of users, DynamoDB streams can be used to aggregate data, Lambda function to push to Elasticsearch. Elasticsearch can be used for application monitoring and Analyzing product usage data. Has Kibana visualisations. Lastly question doesn't mention anything about querying 1.https://rockset.com/blog/live-dashboards-dynamodb-streams-lambda-elasticache/ 2.https://aws.amazon.com/blogs/compute/indexing-amazon-dynamodb-content-with-amazon-elasticsearch-service-using-aws-lambda/ 3. https://aws.amazon.com/blogs/startups/combining-dynamodb-and-amazon-elasticsearch-with-lambda/ 4.https://d1.awsstatic.com/whitepapers/Big_Data_Analytics_Options_on_AWS.pdf?did=wp_card&trk=wp_card

upvoted 23 times

BigEv

3 years, 7 months ago

Good catch, B +1

upvoted 6 times

...