Skip to content

A Java library for computing and comparing Nilsimsa string similarity hashes.

License

Notifications You must be signed in to change notification settings

weblyzard/nilsimsa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

58 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Nilsimsa for Java

Build Status

Nilsimsa is a hashing function which belongs to the family of locality sensitive hashes (LSH). This library provides functions to compute and compare the Nilsimsa string similarity hash for a given string.

The code is a port of the Python Nilsimsa implementation by Michael Itz to the Java language: http://code.google.com/p/py-nilsimsa.

Original C nilsimsa-0.2.4 implementation by cmeclax: http://ixazon.dynip.com/~cmeclax/nilsimsa.html

Requirements

  • Java 8+

Maven Dependency

 <dependency>
     <groupId>com.weblyzard.lib.string</groupId>
     <artifactId>nilsimsa</artifactId>
     <version>0.0.5</version>
 </dependency>

Examples

Compute and output the Nilsimsa hash

String text = "A short test message"; 
Nilsimsa n = Nilsimsa.getHash(text);
System.out.println("Nilsimsa hash for message '" + text + "': " + n.hexdigest());

Text similarity

Nilsimsa first = Nilsimsa.getHash("A short test message");
Nilsimsa second = Nilsimsa.getHash("A short test message!");
Nilsimsa third = Nilsimsa.getHash("Something completely different");

System.out.println(first.bitwiseDifference(first));    // 0
System.out.println(first.bitwiseDifference(second));   // 3
System.out.println(first.bitwiseDifference(third));    // 133

Determine whether two strings are significantly different

List<String> testList = Arrays.asList("A short test message", 
                                      "A short test message!", 
                                      "Something completely different");

for (String firstString: testList) {
    for (String secondString: testList) {
        Nilsimsa firstHash = Nilsimsa.getHash(firstString);
        Nilsimsa secondHash = Nilsimsa.getHash(secondString);

        System.out.println("The hash value of text '" + firstString + "' and '" 
            + secondString + "' differ in " + firstHash.bitwiseDifference(secondHash) + " bits.");
    }
}

Changelog

Please refer to the releases page.