Skip to content

The Repository Contains The CPP Program to Calculate the Cosine Similarity Between two Documents Text

License

Notifications You must be signed in to change notification settings

mrpawan-gupta/TextTo

Repository files navigation

Text Similarity : Cosine Similarity

The purpose of this example is to get familiarize with Text Processing & Information Retrieval By :

  • Calculing Term Frequency,
  • Tokenizing Vectors,
  • Calculating Cosine Similarity,
  • and, Vecort Product.

In this Project, we'll:

  • Read two lines of text from two files, and
  • Tokenize them;
  • Read a list of stop words from another file, and
  • Filter them out;
  • Compute the cosine similarity of the two lines of text (using frequencies), and
  • Write the result into a file.

Cosine Similarity is defined as vector similarity in terms of the angle separating two vectors. It is calculated by Dot product of vectors. to get similarity ranging from -1 to 1 where

  • 1 is Exact match
  • -1 is Exact Unmatched
  • 0 is Unmatched
Cosine similarity formula

About

The Repository Contains The CPP Program to Calculate the Cosine Similarity Between two Documents Text

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages