Exam AWS Certified Machine Learning - Specialty topic 1 question 100 discussion

Exam question from Amazon's AWS Certified Machine Learning - Specialty

Question #: 100
Topic #: 1

[All AWS Certified Machine Learning - Specialty Questions]

A Data Scientist is developing a binary classifier to predict whether a patient has a particular disease on a series of test results. The Data Scientist has data on
400 patients randomly selected from the population. The disease is seen in 3% of the population.
Which cross-validation strategy should the Data Scientist adopt?

A. A k-fold cross-validation strategy with k=5
B. A stratified k-fold cross-validation strategy with k=5
C. A k-fold cross-validation strategy with k=5 and 3 repeats
D. An 80/20 stratified split between training and validation

Show Suggested Answer

Suggested Answer: B 🗳️

by scuzzy2010 at Feb. 19, 2021, 2:26 a.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

scuzzy2010

Highly Voted 3 years, 9 months ago

B - stratified k-fold cross-validation will enforce the class distribution in each split of the data to match the distribution in the complete training dataset.

upvoted 16 times

...

SophieSu

Highly Voted 3 years, 8 months ago

B is the correct answer. Use Stratified k-Fold Cross-Validation for Imbalanced Classification. Stratified train/test splits is an option too. But the question is specifically asking "cross-validation" strategy.

upvoted 9 times

...

MultiCloudIronMan

Most Recent 8 months ago

Selected Answer: B

In summary, Option B is the most appropriate strategy for handling the imbalanced dataset and ensuring reliable performance metrics for the binary classifier.

upvoted 1 times

...

Mickey321

1 year, 10 months ago

Selected Answer: B

for imbalanced data. Stratified k-fold cross-validation ensures that the distribution of the target variable is the same in each fold. This is important for binary classification problems, where the target variable is imbalanced. In this case, the disease is seen in only 3% of the population. This means that if we do not use stratified k-fold cross-validation, then there is a risk that the training and validation sets will not be representative of the actual population.

upvoted 1 times

...