exam questions

Exam AWS Certified Machine Learning - Specialty All Questions

View all questions & answers for the AWS Certified Machine Learning - Specialty exam

Exam AWS Certified Machine Learning - Specialty topic 1 question 168 discussion

A data engineer at a bank is evaluating a new tabular dataset that includes customer data. The data engineer will use the customer data to create a new model to predict customer behavior. After creating a correlation matrix for the variables, the data engineer notices that many of the 100 features are highly correlated with each other.
Which steps should the data engineer take to address this issue? (Choose two.)

  • A. Use a linear-based algorithm to train the model.
  • B. Apply principal component analysis (PCA).
  • C. Remove a portion of highly correlated features from the dataset.
  • D. Apply min-max feature scaling to the dataset.
  • E. Apply one-hot encoding category-based variables.
Show Suggested Answer Hide Answer
Suggested Answer: BC 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
ayatkhrisat
Highly Voted 2 years, 12 months ago
Selected Answer: BC
B,C should be the answer
upvoted 18 times
...
MultiCloudIronMan
Most Recent 7 months, 1 week ago
Selected Answer: BC
Apply principal component analysis (PCA) (Option B): PCA is a dimensionality reduction technique that transforms the original features into a smaller set of uncorrelated components, which can help in reducing the redundancy caused by highly correlated features. Remove a portion of highly correlated features from the dataset (Option C): By removing some of the highly correlated features, the data engineer can simplify the model and reduce multicollinearity, which can improve the model’s performance and interpretability
upvoted 1 times
...
vkbajoria
1 year, 2 months ago
BD should be the answer
upvoted 1 times
vkbajoria
1 year, 1 month ago
I change my mind to B and C now
upvoted 1 times
...
...
DimLam
1 year, 6 months ago
Selected Answer: BD
PCA is sensitive to the variance of features, so it's a common practice to standardize (e.g., z-score normalization) or scale (e.g., min-max scaling) the features before applying PCA. If the features are on different scales, it can distort which principal components are viewed as the most important.
upvoted 2 times
...
kaike_reis
1 year, 8 months ago
Selected Answer: BD
Well, first time that I go with the Suggested Answer. D - B is the way: We want to solve the base correlation problem. That said, Letters A - E don't solve this problem, so they're wrong. Letter C partially solves the problem, so it is wrong. As we want steps, the correct alternatives are: D (ensure that all variables are on the same scale) and B (apply PCA that removes all correlation from the base while keeping most of the information). Again, from my perspective (C) is vague and using (B) removes the necessity of drop highly correlated features.
upvoted 2 times
DimLam
1 year, 6 months ago
Agree. As a question asks us what steps to perform, then it is logical to say: "Scale features and apply PCA" we can't answer "remove a portion of correlated features and then apply PCA", or vice versa. as it doesn't make sense. It would make some sense if we were asked "What technics engineer can apply", or smth similar
upvoted 1 times
...
...
Mickey321
1 year, 9 months ago
Selected Answer: BC
the most effective steps to address the issue of high correlation among the features in the dataset are removing a portion of highly correlated features and applying principal component analysis (PCA) for dimensionality reduction. These steps will help improve the data quality and predictive performance of the model.
upvoted 1 times
...
AjoseO
2 years, 2 months ago
Selected Answer: BC
PCA is a widely used technique for reducing the dimensionality of high-dimensional datasets while retaining as much of the original variability as possible. It is particularly useful when dealing with highly correlated features. Removing a portion of highly correlated features can be another effective way to address the issue of high correlation. By removing some of the correlated features, the model can become less complex and less prone to overfitting.
upvoted 2 times
...
ovokpus
2 years, 10 months ago
Selected Answer: BC
MinMax scaling does nothing to fix the issue here
upvoted 3 times
...
ckkobe24
2 years, 12 months ago
Selected Answer: BC
BC for me, minmax scaler cannot remove multicollinearity?
upvoted 4 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago