Skip to content

vmarkovtsev/go-lcss

Repository files navigation

go-lcss logo

go-lcss

Fast Longest Common Substring algorithm in Go.

GoDoc Travis build Status Code coverage Go Report Card Apache 2.0 license

OverviewHow To UseInstallationContributionsLicense


Overview

Longest Common Substring (don't confuse with Longest Common Subsequence) is a well-known computer science problem of finding the longest substring which appears in all the given strings. It can be efficiently solved in log-linear time and linear space. The implemented algorithm uses the code to build suffix arrays which was copy-pasted from the Go's standard library - it is internal to index/suffixarray and unfortunately cannot be invoked directly. There is a blog post which gives some general understanding of that algorithm, though the actual implementation is quite different (and optimized).

How To Use

There are two functions: "basic" and "advanced". The former is straightforward:

import "gopkg.in/vmarkovtsev/go-lcss.v1"

lcss.LongestCommonSubstring([]byte("ABABC"), []byte("BABCA"), []byte("ABCBA"))
// []byte("ABC")

The "advanced" function allows to cache the suffix arrays. It can be useful since the construction of suffix arrays dominates over the run time of the algorithm:

import "gopkg.in/vmarkovtsev/go-lcss.v1"

s1, s2, s3 := []byte("ABABC"), []byte("BABCA"), []byte("ABCBA")
lcss.LongestCommonSubstringWithSuffixArrays(
	[][]byte{s1, s2, s3},
	[][]int{lcss.Qsufsort(s1), lcss.Qsufsort(s2), lcss.Qsufsort(s3)})
// []byte("ABC")

Installation

go get gopkg.in/vmarkovtsev/go-lcss.v1

The code supports building under Go >= 1.8.

Contributions

...are pretty much welcome! See contributing.md and code_of_conduct.md.

License

Apache 2.0, see LICENSE. It allows you to use this code in proprietary projects.

About

Fast Longest Common Substring in Go

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Languages