I was working through the Section-Based – Exploratory Data Analysis (Machine Learning) and came across the question below.
“A Machine Learning Specialist has various CSV training datasets stored in an S3 bucket. Previous models trained with similar training data size using the Amazon SageMaker XGBoost algorithm have a slow training process. The Specialist wants to decrease the amount of time spent on training the model.
Which combination of steps should be taken by the Specialist? (Select TWO.)“
The answer was given as CSV in Pipe mode or RecordIO Protobuf. But XGboost dose not support Pipe mode nor does it support RecordIO.
A Machine Learning Specialist is training a binary classification model for a particular business use case. To help the Specialist in choosing the most optimal model, the company has given the following conditions that must be satisfied:
Choose the most cost-effective model given that false negatives are 3 times more expensive than false positives.
Choose the model with a recall rate of 85% or more.
Choose the model with a false negative rate of 15% or less.
The Specialist has generated a confusion matrix for each model for evaluation.
Which of the following confusion matrices meets the business requirements?
Answer is correct…
but the formula for Recall
To correctly answer this problem, we have to know three formula:
Recall = TP / (TP/FN)
False Negative Rate = FN / (FN+TP)
Should that be Recall = TP / (TP + FN) just checking