Exam AWS Certified Machine Learning - Specialty All Questions

View all questions & answers for the AWS Certified Machine Learning - Specialty exam

Exam AWS Certified Machine Learning - Specialty topic 1 question 109 discussion

Exam question from Amazon's AWS Certified Machine Learning - Specialty

Question #: 109
Topic #: 1

[All AWS Certified Machine Learning - Specialty Questions]

A Data Scientist received a set of insurance records, each consisting of a record ID, the final outcome among 200 categories, and the date of the final outcome.
Some partial information on claim contents is also provided, but only for a few of the 200 categories. For each outcome category, there are hundreds of records distributed over the past 3 years. The Data Scientist wants to predict how many claims to expect in each category from month to month, a few months in advance.
What type of machine learning model should be used?

A. Classification month-to-month using supervised learning of the 200 categories based on claim contents.
B. Reinforcement learning using claim IDs and timestamps where the agent will identify how many claims in each category to expect from month to month.
C. Forecasting using claim IDs and timestamps to identify how many claims in each category to expect from month to month.
D. Classification with supervised learning of the categories for which partial information on claim contents is provided, and forecasting using claim IDs and timestamps for all other categories.

Show Suggested Answer

Suggested Answer: C 🗳️

by cnethers at Feb. 8, 2021, 7:58 p.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

JBX2010

Highly Voted 2 years, 10 months ago

I think it should be C as the final outcome among 200 categories is already know. No need to build a classification model. It's pure forecasting problem.

upvoted 23 times

abdohanfi

2 years, 10 months ago

he said for a few what about the unclassified many i think we need to make classification for the rest first as it will help us with forecasting later with month to month forecasting

upvoted 2 times

...

SophieSu

Highly Voted 2 years, 10 months ago

C is my answer. No need to do classification. Because you know whether the insurance has a claim or not in the dataset. The claim contents do not provide additional information.

upvoted 7 times

...

Mickey321

Most Recent 11 months, 4 weeks ago

Selected Answer: C

forcasting

upvoted 2 times

...

kaike_reis

1 year ago

Selected Answer: C

It's pure forescasting problem.

upvoted 1 times

...

Chelseajcole

1 year, 5 months ago

I would say no machine learning model needed at all. Just using count group by categories SQL is enough

upvoted 1 times

...

drcok87

1 year, 6 months ago

FinalOutcome 1 2 . 200 RecordID, FinalOutcome, Date, ClaimContents 1 2 . 100000 Note: claim content has partial information, only for few of 200 categories predict how many claims to expect in each category from month to month, a few months in advance We dont need the claim contents, we have all we need from first 3 columns to train a forecast model c

upvoted 1 times

...

AjoseO

1 year, 6 months ago

Selected Answer: C

Forecasting using claim IDs and timestamps to identify how many claims in each category to expect from month to month. The problem requires the prediction of the number of claims in each category for each month, which is a time series forecasting problem. The timestamps and record IDs can be used to model the underlying patterns in the data, and the model can be trained to predict the number of claims in each category for future months based on these patterns. While the claim contents might provide additional information, the fact that partial information is only available for a few categories suggests that this information might not be enough to build a robust model, and that it might not be possible to apply supervised learning to all 200 categories. Instead, the model should be trained on the time series data (claim IDs and timestamps) for all categories, and the claim contents can be used to improve the accuracy of the model only for the categories for which such information is available.

upvoted 4 times

...

informatica

1 year, 7 months ago

how can a forecasting/classification model can be based on the claim ID? (that should be unique)

upvoted 1 times

...

matteocal

2 years ago

Selected Answer: C

it's a forecasting problem, not a classification one

upvoted 2 times

...

Dr_Kiko

2 years, 9 months ago

predict how many claims to expect in each category from month to month, a few months in advance C is the only one mentioning forecasting

upvoted 2 times

...

kezzzzz

2 years, 9 months ago

D is correct. Multi-label classification to impute the missing claim contents, then forecasting what we want. C is missing the imputation part.

upvoted 5 times

f4bi4n

2 years, 2 months ago

The question is, can we get something useful out of the handful of 200s and will this impact the forecast as we could forecast the numbers without...

upvoted 1 times

...

randomnamer

2 years, 10 months ago

It is true that the final outcome is known. But C does not use the partial information from the 200 categories. Reinforcement learning currently is state of the art in stock prediction and other time series. Why waste valuable information? For me it's B.

upvoted 2 times

...

cnethers

2 years, 10 months ago

This is a supervised learning approach: Supervised learning problems can be further grouped into regression and classification problems. Classification: A classification problem is when the output variable is a category, such as “red” and “blue” or “disease” and “no disease.” Regression: A regression problem is when the output variable is a real value, such as “dollars” or “weight.”

upvoted 2 times

...