exam questions

Exam AWS Certified Machine Learning - Specialty All Questions

View all questions & answers for the AWS Certified Machine Learning - Specialty exam

Exam AWS Certified Machine Learning - Specialty topic 1 question 125 discussion

A financial company is trying to detect credit card fraud. The company observed that, on average, 2% of credit card transactions were fraudulent. A data scientist trained a classifier on a year's worth of credit card transactions data. The model needs to identify the fraudulent transactions (positives) from the regular ones
(negatives). The company's goal is to accurately capture as many positives as possible.
Which metrics should the data scientist use to optimize the model? (Choose two.)

  • A. Specificity
  • B. False positive rate
  • C. Accuracy
  • D. Area under the precision-recall curve
  • E. True positive rate
Show Suggested Answer Hide Answer
Suggested Answer: DE 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
littlewat
Highly Voted 2 years, 7 months ago
D, E is the answer. we need to make the recall rate(not precision) high.
upvoted 37 times
...
[Removed]
Highly Voted 2 years, 8 months ago
To maximize detection of fraud in real-world, imbalanced datasets, D and E should always be applied. https://en.wikipedia.org/wiki/Sensitivity_and_specificity https://machinelearningmastery.com/roc-curves-and-precision-recall-curves-for-imbalanced-classification/
upvoted 13 times
[Removed]
2 years, 8 months ago
Note, True positive rate = Sensitivity = Recall
upvoted 5 times
cnethers
2 years, 8 months ago
that is not correct unfortunately Recall is = Sensitivity = False Negative which is a Type II error Precision = specificity = False Positive which is a Type I error I do agree that in the real world you would focus on Recall/sensitivity ie. reducing type II errors. However, in the question, they want to reduce the False Positives so you would need to focus on precision and specificity minimizing type I errors
upvoted 5 times
yummytaco
2 years, 7 months ago
recall = sensitivity = TRUE POSITIVE RATE https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&cad=rja&uact=8&ved=2ahUKEwjhyduhndjwAhXtzDgGHVsSBacQFjABegQIBRAD&url=https%3A%2F%2Fwww.split.io%2Fglossary%2Ffalse-positive-rate%2F&usg=AOvVaw10zzmY-IDlhbboUTwEMnqw
upvoted 2 times
...
seanLu
2 years, 7 months ago
This is incorrect. The goal is to capture as many positive as possible, so false positive is not a concern. suppose we have 100 samples, 2 are positive, the rest 98 negative. We have two models: A has TP = 2, TN = 48, FP = 50, FN = 0. B has TP = 1, TN = 88, FP = 10, FN = 1. Model A has higher false positive rate (50/98 vs 10/98). However we will choose A, since it captures all TP. I will go with D, E.
upvoted 6 times
...
...
...
...
loict
Most Recent 8 months, 2 weeks ago
Selected Answer: DE
"accurately capture positives" means maximize TPR. A. NO - Specificity = TN / ( TN + FP ) is a measure of negative cases B. NO - FPR = FP / Total C. NO - given the class imbalance, overall accuracy would not help D. YES - not sure if we need that on top of E, but other options are eliminated anyway E. YES - TPR = TP / Total, what we want
upvoted 3 times
...
teka112233
9 months ago
Selected Answer: DE
The data scientist should use True positive rate and Area under the precision-recall curve to optimize the model. The true positive rate (TPR) is the proportion of actual positives that are correctly identified as such. It is also known as sensitivity or recall. In this case, it is important to capture as many fraudulent transactions as possible, so the TPR should be maximized. The area under the precision-recall curve (AUPRC) is a measure of how well the model is able to distinguish between positive and negative classes. It is a good metric to use when the classes are imbalanced, as in this case where only 2% of transactions are fraudulent. The AUPRC summarizes the trade-off between precision and recall across all possible thresholds. Accuracy and specificity are not good metrics to use when the classes are imbalanced because they can be misleading. The false positive rate (FPR) is also not a good metric to use because it does not take into account the number of true negatives.
upvoted 1 times
...
Mickey321
9 months, 1 week ago
Selected Answer: DE
Metric D: Area under the precision-recall curve (AUPRC) is a good metric to use for imbalanced classification problems, where the positive class is much less frequent than the negative class. Precision is the proportion of positive predictions that are correct, and recall (or true positive rate) is the proportion of positive cases that are detected. AUPRC summarizes the trade-off between precision and recall for different decision thresholds, and a higher AUPRC means that the model can achieve both high precision and high recall. Since the company’s goal is to accurately capture as many positives as possible, AUPRC can help them evaluate how well the model performs on the minority class. Metric E: True positive rate (TPR) is another good metric to use for imbalanced classification problems, as it measures the sensitivity of the model to the positive class. TPR is the same as recall, and it is the proportion of positive cases that are detected by the model. A higher TPR means that the model can identify more fraudulent transactions, which is the company’s goal.
upvoted 1 times
...
AjoseO
1 year, 3 months ago
Selected Answer: DE
The goal is to accurately capture as many fraudulent transactions (positives) as possible. To optimize the model towards this goal, the data scientist should focus on metrics that emphasize the true positive rate and the area under the precision-recall curve. True positive rate (TPR or sensitivity) is the proportion of actual positive cases that are correctly identified as positive by the model. A higher TPR means that more fraudulent transactions are being captured. The precision-recall curve is a graph that shows the trade-off between precision and recall for different thresholds.
upvoted 1 times
AjoseO
1 year, 3 months ago
Precision is the fraction of correctly identified positive instances among all instances the model has classified as positive. Recall, also known as the true positive rate, is the fraction of positive instances that are correctly identified as positive by the model. A higher area under the precision-recall curve indicates that the model is making fewer false positive predictions and more true positive predictions, which aligns with the goal of the financial company to accurately capture as many fraudulent transactions as possible.
upvoted 1 times
...
...
ystotest
1 year, 6 months ago
Selected Answer: DE
agreed with DE
upvoted 4 times
...
f4bi4n
1 year, 11 months ago
Why not A and D? - Specificity shows us how the FNR is - AUC PR includes Precision and Recall which shows us the ratio of TP to TP/FP and TP to TP / FN
upvoted 2 times
...
SriAkula
2 years, 2 months ago
Answer: D&E
upvoted 1 times
...
KM226
2 years, 5 months ago
I meant say D&E not BD
upvoted 1 times
...
KM226
2 years, 5 months ago
Selected Answer: BD
I believe the answer is B&D, which equals F1. F1 combines precision and Sensitivity.
upvoted 2 times
...
mahmoudai
2 years, 6 months ago
D&E is the only choices that takes False Negatives into considration
upvoted 1 times
f4bi4n
1 year, 11 months ago
TPR is already included in the AUC PR TNR is not included in all others besides A
upvoted 1 times
...
...
AShahine21
2 years, 6 months ago
Recall and TPR D and E
upvoted 2 times
...
kawow
2 years, 7 months ago
AB is the answer
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...