Skip to content

Latest commit

 

History

History
75 lines (57 loc) · 3 KB

README.md

File metadata and controls

75 lines (57 loc) · 3 KB

go-lcss logo

go-lcss

Fast Longest Common Substring algorithm in Go.

GoDoc Travis build Status Code coverage Go Report Card Apache 2.0 license

OverviewHow To UseInstallationContributionsLicense


Overview

Longest Common Substring (don't confuse with Longest Common Subsequence) is a well-known computer science problem of finding the longest substring which appears in all the given strings. It can be efficiently solved in log-linear time and linear space. The implemented algorithm uses the code to build suffix arrays which was copy-pasted from the Go's standard library - it is internal to index/suffixarray and unfortunately cannot be invoked directly. There is a blog post which gives some general understanding of that algorithm, though the actual implementation is quite different (and optimized).

How To Use

There are two functions: "basic" and "advanced". The former is straightforward:

import "gopkg.in/vmarkovtsev/go-lcss.v1"

lcss.LongestCommonSubstring([]byte("ABABC"), []byte("BABCA"), []byte("ABCBA"))
// []byte("ABC")

The "advanced" function allows to cache the suffix arrays. It can be useful since the construction of suffix arrays dominates over the run time of the algorithm:

import "gopkg.in/vmarkovtsev/go-lcss.v1"

s1, s2, s3 := []byte("ABABC"), []byte("BABCA"), []byte("ABCBA")
lcss.LongestCommonSubstringWithSuffixArrays(
	[][]byte{s1, s2, s3},
	[][]int{lcss.Qsufsort(s1), lcss.Qsufsort(s2), lcss.Qsufsort(s3)})
// []byte("ABC")

Installation

go get gopkg.in/vmarkovtsev/go-lcss.v1

The code supports building under Go >= 1.8.

Contributions

...are pretty much welcome! See contributing.md and code_of_conduct.md.

License

Apache 2.0, see LICENSE. It allows you to use this code in proprietary projects.