exam questions

Exam AWS Certified Machine Learning - Specialty All Questions

View all questions & answers for the AWS Certified Machine Learning - Specialty exam

Exam AWS Certified Machine Learning - Specialty topic 1 question 87 discussion

A Data Scientist needs to analyze employment data. The dataset contains approximately 10 million observations on people across 10 different features. During the preliminary analysis, the Data Scientist notices that income and age distributions are not normal. While income levels shows a right skew as expected, with fewer individuals having a higher income, the age distribution also shows a right skew, with fewer older individuals participating in the workforce.
Which feature transformations can the Data Scientist apply to fix the incorrectly skewed data? (Choose two.)

  • A. Cross-validation
  • B. Numerical value binning
  • C. High-degree polynomial transformation
  • D. Logarithmic transformation
  • E. One hot encoding
Show Suggested Answer Hide Answer
Suggested Answer: BD 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
seanLu
Highly Voted 3 years ago
I would go with B,D. Refer to quantile binning and log transform below. https://towardsdatascience.com/understanding-feature-engineering-part-1-continuous-numeric-data-da4e47099a7b
upvoted 31 times
OmarSaadEldien
3 years ago
Agree with B &D B binning for age D for make income in normal dist
upvoted 7 times
...
omar_bahrain
3 years ago
agree B&D. both are strategies to eliminate the effect of skewing
upvoted 6 times
...
...
Joe_Zhang
Highly Voted 3 years, 1 month ago
SHOULD BE C,D
upvoted 10 times
...
Togy
Most Recent 3 weeks, 3 days ago
Selected Answer: B
Binning involves grouping numerical values into discrete intervals or bins. While it can simplify the representation of a feature and potentially make the distribution appear less skewed in a histogram, it doesn't fundamentally change the underlying skewness of the continuous data. It discretizes the data rather than transforming its distribution.
upvoted 1 times
...
VR10
8 months, 2 weeks ago
Selected Answer: BD
D. Logarithmic Transformation: Addresses the right-skewed income and age distributions. The log function compresses large values, reducing the impact of outliers and making the distributions closer to normal. B. Numerical Value Binning: Useful for the age distribution. By grouping ages into bins (e.g., 20-29, 30-39, etc.), you reduce the impact of the right skew caused by fewer older individuals. While it doesn't achieve a perfectly normal distribution, it often makes the feature more interpretable and manageable for modeling.
upvoted 1 times
...
AmeeraM
1 year ago
Selected Answer: BD
B and D
upvoted 1 times
...
Mickey321
1 year, 2 months ago
Selected Answer: BD
Agree with B &D B binning for age D for make income in normal dist
upvoted 1 times
...
Shailendraa
2 years, 1 month ago
BD is correct
upvoted 2 times
...
[Removed]
2 years, 4 months ago
A and E, it asks incorrectly
upvoted 1 times
...
Sivadharan
2 years, 5 months ago
Selected Answer: BD
B & D. Reasonable explanation in below discussion.
upvoted 3 times
...
angnam
2 years, 9 months ago
BD With age, always do quantile binning With skewed data, always use log.
upvoted 1 times
...
Juka3lj
3 years ago
B because we have skewed data with few exeptions D log transform can change distribution of data not C - because there is no indicaiton in the text, that data is following any of the HIGH DEGREE polynomial distribution like x^ 10
upvoted 5 times
...
Vita_Rasta84444
3 years ago
should be c and d
upvoted 4 times
...
achiko
3 years ago
polynomial transformations can also be used for skewed data. https://machinelearningmastery.com/polynomial-features-transforms-for-machine-learning/
upvoted 3 times
...
jiadong
3 years ago
It seems the ans are C,D https://anshikaaxena.medium.com/how-skewed-data-can-skrew-your-linear-regression-model-accuracy-and-transfromation-can-help-62c6d3fe4c53
upvoted 5 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago