Exam AWS Certified Machine Learning - Specialty All Questions

View all questions & answers for the AWS Certified Machine Learning - Specialty exam

Exam AWS Certified Machine Learning - Specialty topic 1 question 125 discussion

Exam question from Amazon's AWS Certified Machine Learning - Specialty

Question #: 125
Topic #: 1

[All AWS Certified Machine Learning - Specialty Questions]

A financial company is trying to detect credit card fraud. The company observed that, on average, 2% of credit card transactions were fraudulent. A data scientist trained a classifier on a year's worth of credit card transactions data. The model needs to identify the fraudulent transactions (positives) from the regular ones
(negatives). The company's goal is to accurately capture as many positives as possible.
Which metrics should the data scientist use to optimize the model? (Choose two.)

A. Specificity
B. False positive rate
C. Accuracy
D. Area under the precision-recall curve
E. True positive rate

Show Suggested Answer

Suggested Answer: DE 🗳️

by [deleted] at Feb. 5, 2021, 8:10 p.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

littlewat

Highly Voted 2 years, 7 months ago

D, E is the answer. we need to make the recall rate(not precision) high.

upvoted 37 times

...

[Removed]

Highly Voted 2 years, 8 months ago

To maximize detection of fraud in real-world, imbalanced datasets, D and E should always be applied. https://en.wikipedia.org/wiki/Sensitivity_and_specificity https://machinelearningmastery.com/roc-curves-and-precision-recall-curves-for-imbalanced-classification/

upvoted 13 times

[Removed]

2 years, 8 months ago

Note, True positive rate = Sensitivity = Recall

upvoted 5 times

cnethers

2 years, 8 months ago

that is not correct unfortunately Recall is = Sensitivity = False Negative which is a Type II error Precision = specificity = False Positive which is a Type I error I do agree that in the real world you would focus on Recall/sensitivity ie. reducing type II errors. However, in the question, they want to reduce the False Positives so you would need to focus on precision and specificity minimizing type I errors

upvoted 5 times

yummytaco

2 years, 7 months ago

recall = sensitivity = TRUE POSITIVE RATE https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&cad=rja&uact=8&ved=2ahUKEwjhyduhndjwAhXtzDgGHVsSBacQFjABegQIBRAD&url=https%3A%2F%2Fwww.split.io%2Fglossary%2Ffalse-positive-rate%2F&usg=AOvVaw10zzmY-IDlhbboUTwEMnqw

upvoted 2 times

...

seanLu

2 years, 7 months ago

This is incorrect. The goal is to capture as many positive as possible, so false positive is not a concern. suppose we have 100 samples, 2 are positive, the rest 98 negative. We have two models: A has TP = 2, TN = 48, FP = 50, FN = 0. B has TP = 1, TN = 88, FP = 10, FN = 1. Model A has higher false positive rate (50/98 vs 10/98). However we will choose A, since it captures all TP. I will go with D, E.

upvoted 6 times

...

Load full discussion...

...

loict

Most Recent 8 months, 2 weeks ago

Selected Answer: DE

"accurately capture positives" means maximize TPR. A. NO - Specificity = TN / ( TN + FP ) is a measure of negative cases B. NO - FPR = FP / Total C. NO - given the class imbalance, overall accuracy would not help D. YES - not sure if we need that on top of E, but other options are eliminated anyway E. YES - TPR = TP / Total, what we want

upvoted 3 times

...

teka112233

9 months ago

Selected Answer: DE

The data scientist should use True positive rate and Area under the precision-recall curve to optimize the model. The true positive rate (TPR) is the proportion of actual positives that are correctly identified as such. It is also known as sensitivity or recall. In this case, it is important to capture as many fraudulent transactions as possible, so the TPR should be maximized. The area under the precision-recall curve (AUPRC) is a measure of how well the model is able to distinguish between positive and negative classes. It is a good metric to use when the classes are imbalanced, as in this case where only 2% of transactions are fraudulent. The AUPRC summarizes the trade-off between precision and recall across all possible thresholds. Accuracy and specificity are not good metrics to use when the classes are imbalanced because they can be misleading. The false positive rate (FPR) is also not a good metric to use because it does not take into account the number of true negatives.

upvoted 1 times

...

Mickey321

9 months, 1 week ago

Selected Answer: DE

Metric D: Area under the precision-recall curve (AUPRC) is a good metric to use for imbalanced classification problems, where the positive class is much less frequent than the negative class. Precision is the proportion of positive predictions that are correct, and recall (or true positive rate) is the proportion of positive cases that are detected. AUPRC summarizes the trade-off between precision and recall for different decision thresholds, and a higher AUPRC means that the model can achieve both high precision and high recall. Since the company’s goal is to accurately capture as many positives as possible, AUPRC can help them evaluate how well the model performs on the minority class. Metric E: True positive rate (TPR) is another good metric to use for imbalanced classification problems, as it measures the sensitivity of the model to the positive class. TPR is the same as recall, and it is the proportion of positive cases that are detected by the model. A higher TPR means that the model can identify more fraudulent transactions, which is the company’s goal.

upvoted 1 times

...

AjoseO

1 year, 3 months ago

Selected Answer: DE

The goal is to accurately capture as many fraudulent transactions (positives) as possible. To optimize the model towards this goal, the data scientist should focus on metrics that emphasize the true positive rate and the area under the precision-recall curve. True positive rate (TPR or sensitivity) is the proportion of actual positive cases that are correctly identified as positive by the model. A higher TPR means that more fraudulent transactions are being captured. The precision-recall curve is a graph that shows the trade-off between precision and recall for different thresholds.

upvoted 1 times

AjoseO

1 year, 3 months ago

Precision is the fraction of correctly identified positive instances among all instances the model has classified as positive. Recall, also known as the true positive rate, is the fraction of positive instances that are correctly identified as positive by the model. A higher area under the precision-recall curve indicates that the model is making fewer false positive predictions and more true positive predictions, which aligns with the goal of the financial company to accurately capture as many fraudulent transactions as possible.

upvoted 1 times

...

ystotest

1 year, 6 months ago

Selected Answer: DE

agreed with DE

upvoted 4 times

...

f4bi4n

1 year, 11 months ago

Why not A and D? - Specificity shows us how the FNR is - AUC PR includes Precision and Recall which shows us the ratio of TP to TP/FP and TP to TP / FN

upvoted 2 times

...