Home › Forums › AWS › AWS Certified Machine Learning – Specialty › Incorrect Explanation of tf-idf › Reply To: Incorrect Explanation of tf-idf
-
Hello zzzz,
Thank you for pointing this out, and we sincerely apologize for the confusion this may have caused. We will be updating the explanation to reflect the correct definitions.
Term Frequency – Inverse Document Frequency (TF-IDF) is a way to turn text into numerical features for machine learning models. Term Frequency (TF) measures how often a word appears in a document, usually divided by the total number of words. Inverse Document Frequency (IDF) measures how rare or common a word is across all documents in the dataset, giving lower scores to words that appear in many documents and higher scores to words that appear in fewer documents. Multiplying TF by IDF gives a score highlighting words that are frequent in one document but uncommon in the whole collection.
In this scenario, using Scikit-learn’s TfidfVectorizer helps reduce the weight of very common words while giving more importance to distinctive words, which can improve model accuracy.
If you notice anything or need additional assistance, please feel free to reach out to us.
Regards,
Nikee @ Tutorials Dojo