Now we're getting a bit more technical but let's see if we can explain this in a way that's easy to digest...
TF-IDF is a numerical statistic that is intended to reflect how important a word or phrase is to a document in a collection or corpus. In other words, it calculates how significant that chosen word or phrase is in the content of the entire content that it is contained within.
Suppose we have a set of English text documents and wish to determine which document is most relevant to the query "the brown cow". A simple way to start out is by eliminating documents that do not contain all three words "the", "brown", and "cow", but this still leaves many documents. To further distinguish them, we might count the number of times each term occurs in each document; the number of times a term occurs in a document is called its term frequency.
Please note: You need to subscribe to view the remainder of this article.
This section of the article is only available for our subscribers. Please click here to subscribe to a subscription plan to view this part of the article.