Exam AWS Certified Machine Learning - Specialty topic 1 question 236 discussion

Exam question from Amazon's AWS Certified Machine Learning - Specialty

Question #: 236
Topic #: 1

[All AWS Certified Machine Learning - Specialty Questions]

An online store is predicting future book sales by using a linear regression model that is based on past sales data. The data includes duration, a numerical feature that represents the number of days that a book has been listed in the online store. A data scientist performs an exploratory data analysis and discovers that the relationship between book sales and duration is skewed and non-linear.

Which data transformation step should the data scientist take to improve the predictions of the model?

A. One-hot encoding
B. Cartesian product transformation
C. Quantile binning
D. Normalization

Show Suggested Answer

Suggested Answer: C 🗳️

by sevosevo at March 18, 2023, 2:21 p.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

CloudHandsOn

Highly Voted 11 months, 3 weeks ago

Selected Answer: C

C. Quantile binning: Quantile binning (or discretization) involves dividing a continuous variable into bins based on quantiles. This can be useful for handling skewed data by distributing the data more evenly across the bins. However, this method transforms the numerical feature into a categorical one, which might not be ideal for preserving the ordinal nature and the detailed variance of the 'duration' feature in a regression model. If the choice must be made from the given options, Option C (Quantile binning) might be the most suitable, albeit not ideal, as it can at least help in dealing with skewed distributions by distributing the data across bins more evenly. However, the data scientist should consider logarithmic or polynomial transformations for a more direct approach to addressing non-linearity.

upvoted 5 times

...

sevosevo

Highly Voted 1 year, 9 months ago

Selected Answer: C

https://docs.aws.amazon.com/machine-learning/latest/dg/data-transformations-reference.html

upvoted 5 times

...

loict

Most Recent 1 year, 3 months ago

Selected Answer: C

A. NO - One-hot encoding is for featurization of categories B. NO - C. YES - Quantile binning can make data linear (https://docs.aws.amazon.com/machine-learning/latest/dg/data-transformations-reference.html#quantile-binning-transformation) D. NO - Normalization will recenter the data, not change the relationship

upvoted 2 times

...

Mickey321

1 year, 4 months ago

Selected Answer: C

quantile binning

upvoted 1 times

...

jackzhao

1 year, 9 months ago

C is correct

upvoted 3 times

...

blanco750

1 year, 9 months ago

Selected Answer: C

C is the best answer I guess

upvoted 3 times

...

oso0348

1 year, 9 months ago

Selected Answer: C

the correct answer is C, Quantile binning. This transformation divides the data into quantiles (equal-sized intervals) based on the values of the feature (in this case, duration) and replaces the values with the bin number. This transformation can help capture non-linear relationships between features by creating more representative categories for skewed data. The transformed data can then be used to train a non-linear regression model, such as a polynomial regression, to better predict future book sales.

upvoted 4 times

...