Like Q-Gram distance, the input strings are first converted into sets of n-grams (sequences of n characters,
also called k-shingles), but this time the cardinality of each n-gram is not taken into account. Each input string is
simply a set of n-grams. The Jaccard index is then computed as |A ∩ B| / |A ∪ B|
.
Distance is computed as 1 - similarity.
Jaccard index is a metric distance.
val jaccard = Jaccard(2)
println(jaccard.distance("ABCDE", "ABCDF"))
Output:
0.4