Text-distance is a python package written in rust to calculate similarity of two texts. currently supports below algorithms:
- Bi-gram jaccard similarity
- Cosine similarity
- longest common subsequence
- longest common substring
to install python package from source:
pip install maturin
maturin build --release
it will create a whl file under target folder and then you can just install the whl file in python
example usage:
import textdistance
cosine_distance = textdistance.cosine_similarities(["hello there"],
[["hello there", "hi there"]])
jaccard_distance = textdistance.jaccard_similarities(["hello there"],
[["hello there", "hi there"]])
lcsseq = textdistance.longest_common_subsequence_max(["hello there"],
[["hello there", "hi there"]])
lcstr = textdistance.longest_common_substring_max(["hello there"],
[["hello there", "hi there"]])