exam questions

Exam AWS Certified Machine Learning - Specialty All Questions

View all questions & answers for the AWS Certified Machine Learning - Specialty exam

Exam AWS Certified Machine Learning - Specialty topic 1 question 100 discussion

A Data Scientist is developing a binary classifier to predict whether a patient has a particular disease on a series of test results. The Data Scientist has data on
400 patients randomly selected from the population. The disease is seen in 3% of the population.
Which cross-validation strategy should the Data Scientist adopt?

  • A. A k-fold cross-validation strategy with k=5
  • B. A stratified k-fold cross-validation strategy with k=5
  • C. A k-fold cross-validation strategy with k=5 and 3 repeats
  • D. An 80/20 stratified split between training and validation
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
scuzzy2010
Highly Voted 3 years, 7 months ago
B - stratified k-fold cross-validation will enforce the class distribution in each split of the data to match the distribution in the complete training dataset.
upvoted 16 times
...
SophieSu
Highly Voted 3 years, 7 months ago
B is the correct answer. Use Stratified k-Fold Cross-Validation for Imbalanced Classification. Stratified train/test splits is an option too. But the question is specifically asking "cross-validation" strategy.
upvoted 9 times
...
MultiCloudIronMan
Most Recent 6 months, 1 week ago
Selected Answer: B
In summary, Option B is the most appropriate strategy for handling the imbalanced dataset and ensuring reliable performance metrics for the binary classifier.
upvoted 1 times
...
Mickey321
1 year, 8 months ago
Selected Answer: B
for imbalanced data. Stratified k-fold cross-validation ensures that the distribution of the target variable is the same in each fold. This is important for binary classification problems, where the target variable is imbalanced. In this case, the disease is seen in only 3% of the population. This means that if we do not use stratified k-fold cross-validation, then there is a risk that the training and validation sets will not be representative of the actual population.
upvoted 1 times
...
ADVIT
1 year, 10 months ago
B https://towardsdatascience.com/understanding-8-types-of-cross-validation-80c935a4976d
upvoted 1 times
...
Valcilio
2 years, 1 month ago
Selected Answer: B
Stratified cross validation is for unbalanced data like this!
upvoted 1 times
...
AWS__Newbie
3 years, 6 months ago
Why K=5?
upvoted 2 times
eeah
3 years, 1 month ago
K=5 is just standard
upvoted 1 times
...
...
Vita_Rasta84444
3 years, 6 months ago
Yes, B...
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago