Andreev's statistico-combinatorial model (stat_comb_model)

This program is an implementation of Andreev's statistico-combinatorial model for unsupervised learning of morphology, mainly described in:

Andreev, N. D. (1965). Statistiko-kombinatornoe modelirovanie jazykov. Nauka.
Andreev, N. D. (1967). Statistiko-kombinatornye metody v teoretičeskom i prikladnom jazykovedenii. Nauka.

If you don't know Russian and want to get the idea fast, here is my paper documenting this implementation: Unsupervized learning of morphology in the USSR.

It takes as input a corpus that is tokenized and has one sentence per line, and returns a set of classes, each containing a set of affixes that are associated to a set of stems.

Ex: Class 1

Affixes: ies, y
Stems: all, authorit, abilit, ...

USAGE:

stat_comb_model.py -i corpus.txt

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
README.md		README.md
functions.py		functions.py
morph_statistics.py		morph_statistics.py
stat_comb_model.py		stat_comb_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Andreev's statistico-combinatorial model (stat_comb_model)

About

Releases

Packages

Languages

franckbrl/stat_comb_model

Folders and files

Latest commit

History

Repository files navigation

Andreev's statistico-combinatorial model (stat_comb_model)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages