A Machine Learning Specialist is training a binary classification model for email spam filtering. During the model evaluation, the Specialist noticed that the number of predicted spam is significantly smaller than the number of genuine emails. The model did not generalize well when tested on actual data. This is not acceptable and does not meet the business requirements.
How can the Specialist change the model output in the easiest way to meet the business goal?
This question never specifies the business goal. If the business goal requires both high precision and high recall, then the answer marked correct, “adjust the score threshold to to the model performance,” won’t work. Tweaking the threshold can make the model catch more actual spam at the cost of more false positives, or it can make the model better at allowing genuine e-mail to pass, at the cost of allowing some spam to pass as well, but it can’t improve both. Without knowing what the business requirements are (and how the model is failing to meet them), we can’t tell whether tweaking the threshold will work or if we have no choice but to go back and try to improve the model with feature engineering, data augmentation, etc.
In the scenario, the ‘business goal’ is implied and refers to whatever is acceptable to the company. There are various methods that can address the issue, such as going back to the drawing board and improving the quality of training data through resampling, data augmentation, and so on. These are all valid methods; however, the question is mainly asking about the simplest approach that can give the business quick gains in reaching their goal, and that approach is by adjusting the threshold.
I still don’t see how the question has enough information to decide between two of the answers. If the company is getting feedback from customers like “I lost my life savings to a scam that your spam filter didn’t catch, I’m leaving your company and getting a gmail account” and also like “I missed out on my dream job because your spam filter redirected a genuine job offer to my spam folder. I’m never using your company again, this never happened when I used gmail!” then the business goal might be to develop a spam filter that is as satisfactory to customers as gmail’s spam filter, with specific metrics (I’m making these numbers up) of at least 90% of spam are detected, and at least 99% of genuine emails are allowed to pass to customer’s inboxes. If the business goal requires reducing both false positives and false negatives relative to the current model, then tweaking the threshold won’t work. Tweaking the threshold can’t reduce both false positives and false negatives, it can only improve one at the cost of the other. I still think we need more information to answer this question.
That is assuming that I understand what “Adjust the score threshold to tune the model performance” means. I’m assuming it means changing the probability threshold for the prediction. In other words, if the model currently predicts anything with probability(spam) >.5 as spam, “adjust the score threshold” would mean change that so that the model predicts anything with probability(spam) >.4 (or any other value) as spam. That would make the model better at catching real spam, but it will also direct more genuine emails to the spam filter. If “Adjust the score threshold to tune the model performance” means something different, please explain it to me!