Fast Longest Common Substring algorithm in Go.
Overview • How To Use • Installation • Contributions • License
Longest Common Substring
(don't confuse with Longest Common Subsequence) is
a well-known computer science problem of finding the longest substring which appears in all
the given strings. It can be efficiently solved in log-linear time and linear space.
The implemented algorithm uses the code to build suffix arrays
which was copy-pasted from the Go's standard library - it
is internal to index/suffixarray
and unfortunately cannot be invoked directly.
There is a blog post
which gives some general understanding of that algorithm, though the actual implementation is quite different
(and optimized).
There are two functions: "basic" and "advanced". The former is straightforward:
import "gopkg.in/vmarkovtsev/go-lcss.v1"
lcss.LongestCommonSubstring([]byte("ABABC"), []byte("BABCA"), []byte("ABCBA"))
// []byte("ABC")
The "advanced" function allows to cache the suffix arrays. It can be useful since the construction of suffix arrays dominates over the run time of the algorithm:
import "gopkg.in/vmarkovtsev/go-lcss.v1"
s1, s2, s3 := []byte("ABABC"), []byte("BABCA"), []byte("ABCBA")
lcss.LongestCommonSubstringWithSuffixArrays(
[][]byte{s1, s2, s3},
[][]int{lcss.Qsufsort(s1), lcss.Qsufsort(s2), lcss.Qsufsort(s3)})
// []byte("ABC")
go get gopkg.in/vmarkovtsev/go-lcss.v1
The code supports building under Go >= 1.8.
...are pretty much welcome! See contributing.md and code_of_conduct.md.
Apache 2.0, see LICENSE. It allows you to use this code in proprietary projects.