pregex

Probabilistic regular expressions

Kleene star (and +) use geometric distributions on length.
Currently, score returns likelihood of most probable execution trace, rather than marginalising.
Add new primitives with Wrapper class
Primitives may be stateful (so log p(/AA/->xx) is not necessarily 2 * log p(/A/->x))

Usage:

import pregex as pre
r = pre.create("\\d+\\l+\\u+\\s")
samp = r.sample() //'3gclxbZ\t'
score = r.match("123abcABC ") //-34.486418601378595

Todo:

Compare with Dijkstra's algorithm for MAP, maybe it's faster?
Make differentiable character class
use separate bracket types for each function?
'sample' and 'marginalise' modes. Note -- for this, KleeneStar needs to be adapted to get correct score for fo?* -> foo. First calculate probability q=P(o?->ε), then multiply all partialmatches by 1/[1-q(1-p))]
Should still be able to do dynamic programming to combine partialMatches that have different states, so long as the difference in state doesn't affect the continuation
Replace namedtuples with attrs

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.gitignore		.gitignore
.projectignore		.projectignore
.travis.yml		.travis.yml
README.md		README.md
__init__.py		__init__.py
anaconda-project.yml		anaconda-project.yml
pregex.py		pregex.py
test_differentiate.py		test_differentiate.py
test_pregex.py		test_pregex.py