Exam AWS Certified Big Data - Specialty topic 1 question 20 discussion

Exam question from Amazon's AWS Certified Big Data - Specialty

Question #: 20
Topic #: 1

[All AWS Certified Big Data - Specialty Questions]

A game company needs to properly scale its game application, which is backed by DynamoDB. Amazon
Redshift has the past two years of historical data. Game traffic varies throughout the year based on various factors such as season, movie release, and holiday season. An administrator needs to calculate how much read and write throughput should be provisioned for DynamoDB table for each week in advance.
How should the administrator accomplish this task?

A. Feed the data into Amazon Machine Learning and build a regression model.
B. Feed the data into Spark Mlib and build a random forest modest.
C. Feed the data into Apache Mahout and build a multi-classification model.
D. Feed the data into Amazon Machine Learning and build a binary classification model.

Show Suggested Answer

Suggested Answer: B 🗳️

by jlpl at Aug. 10, 2019, 2:12 p.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

Bulti

3 years, 7 months ago

Answer is A. Key is "Redshift has the past two years of historical data". This means we have labelled data that we can use to train a linear regression model to predict RCU and WCU.

upvoted 4 times

...

san2020

3 years, 7 months ago

my selection A

upvoted 3 times

...

AdamSmith

3 years, 7 months ago

"needs to calculate how much read and write throughput should be provisioned for DynamoDB table for each week in advance" A regression model is more suitable for this job, you can just run 2 models, one for WCU, one for RCU. B is for detecting anomalies.

upvoted 4 times

...

kalpanareddy

3 years, 7 months ago

Normally, when predict some numbers like salary, price we use the regression model based on the histrory data. So personally, i agree with the B as it is used to predicted the number of the read and write throughput

upvoted 2 times

...

ME2000

3 years, 7 months ago

The correct answer is B ... An anomaly score with low values indicates that the data point is considered “normal” whereas high values indicate the presence of an anomaly. The definitions of “low” and “high” depend on the application, but common practice suggests that scores beyond three standard deviations from the mean score are considered anomalous. https://aws.amazon.com/blogs/machine-learning/use-the-built-in-amazon-sagemaker-random-cut-forest-algorithm-for-anomaly-detection/

upvoted 3 times

...

antoneti

3 years, 8 months ago

Not sure about A, mainly because regression is to precit just one numeric value, not two (RCU/WCU). Regarding Random forest (B) is also used for reggresion tasks as suggest here: https://www.newgenapps.com/blog/random-forest-analysis-in-ml-and-when-to-use-it

upvoted 2 times

...

bigdatalearner

3 years, 8 months ago

A. Feed the data into Amazon Machine Learning and build a regression model is the right answer because regression model is used for numeric value and here we are looking for RCU/WCU which is numeric as well. though B can be an option but it's not cost effective as it needs EMR cluster

upvoted 4 times

...