Home › Forums › AWS › AWS Certified Machine Learning – Specialty › Problem with a Practice Exam Answer
-
The question begins “A Machine Learning Specialist at a mobile network company is tasked to develop a model that predicts customers who are likely to cancel subscriptions. Since the customer churn costs about $50, the company plans on giving a $3 retention incentive for unhappy customers. The model was evaluated on a test dataset of 100 customers. The confusion matrix for the model is given below:”
The confusion matrix is displayed, showing 8 true positives, 3 false negatives, 8 false positives, and 81 true negatives.
It then asks why the model is feasible to deploy in production.
Now first, I just want to say that I think this is a great question. The problem is that even though I know why it is feasible, none of the answers are actually correct.
The explanation for how to get to the correct answer is very, very good. It provides background, uses equations, and then finally says “because the total loss is less when the model is deployed than when it is not, the model is feasible to deploy in production”. This is absolutely correct. The problem is that this explanation does not actually match any of the answers. I won’t go through each answer, I’ll just focus on the one marked correct.
The first part of the correct answer says the accuracy is 89%. No problem.
But the second part of the “correct answer” says that the “overall spending influenced by the false negatives is greater than the false positives”. And this is clearly not the right reasoning.
There are 8 false positives. Each false positive spends $3 due to the incentive. So the “overall spending influenced by the false positives” is $24.
The “overall spending influenced by the false negatives” is actually more complicated. There are 3 false negatives. Each false negative costs the company $47, if the assumption is that they could’ve been kept by the incentive. So the total cost to the company given this assumption is $141. But this is not the same thing as spending. There is no spending on false negatives, because the company spends nothing to keep them. So really, this number is $0.
Either interpretation of the “spending influenced by the false negatives” is ends up being wrong. If it is $141, it is greater (not less) than the spending influenced by the false positives (which is $24). If it is $0, this creates a different problem: if there were 100 false negatives, then the spending from false negatives would still be $0. So “spending influenced by the false negatives” never means anything, and a model should not be used on the basis of that number for any reason.
To avoid these problems, the correct answer should match the answer explanation. The words “The overall spending influenced by the false negatives is less than the false positives” should be replaced with “The total loss is less when the model is deployed than when it is not” (which is what the explanation says) or something similar.
-
Hello JimA,
First off, thanks for your kind feedback and for explaining your thoughts in detail. Your point is well taken. Yes, the current wording is a muddled representation of what I had in mind. And that is the loss caused by leaving customers and/or incentivization and not the amount of money spent for when a customer leaves vs. the cost of incentivizing them.
We’ll improve the wording for this item.
Let me know if there’s anything I can help you with.
Regards,
Carlo @ Tutorials Dojo
Log in to reply.