Skip to content

Latest commit

 

History

History
26 lines (17 loc) · 616 Bytes

JaccardIndex.md

File metadata and controls

26 lines (17 loc) · 616 Bytes

Jaccard Index

Like Q-Gram distance, the input strings are first converted into sets of n-grams (sequences of n characters, also called k-shingles), but this time the cardinality of each n-gram is not taken into account. Each input string is simply a set of n-grams. The Jaccard index is then computed as |A ∩ B| / |A ∪ B|.

Distance is computed as 1 - similarity.

Jaccard index is a metric distance.

Example

val jaccard = Jaccard(2)
println(jaccard.distance("ABCDE", "ABCDF"))

Output:

0.4

Links