Text Similarity : Cosine Similarity

The purpose of this example is to get familiarize with Text Processing & Information Retrieval By :

Calculing Term Frequency,
Tokenizing Vectors,
Calculating Cosine Similarity,
and, Vecort Product.

In this Project, we'll:

Read two lines of text from two files, and
Tokenize them;
Read a list of stop words from another file, and
Filter them out;
Compute the cosine similarity of the two lines of text (using frequencies), and
Write the result into a file.

Cosine Similarity is defined as vector similarity in terms of the angle separating two vectors. It is calculated by Dot product of vectors. to get similarity ranging from -1 to 1 where

1 is Exact match
-1 is Exact Unmatched
0 is Unmatched

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
Cosine.cpp		Cosine.cpp
LICENSE		LICENSE
README.md		README.md
Sample-Input1.txt		Sample-Input1.txt
Sample-Input2.txt		Sample-Input2.txt
Sample-Output.txt		Sample-Output.txt
StopWords.txt		StopWords.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text Similarity : Cosine Similarity

About

Releases

Packages

Languages

License

mrpawan-gupta/TextTo

Folders and files

Latest commit

History

Repository files navigation

Text Similarity : Cosine Similarity

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages