Skip to content

Andreev's statistico-combinatorial model for morphology learning

Notifications You must be signed in to change notification settings

franckbrl/stat_comb_model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Andreev's statistico-combinatorial model (stat_comb_model)

This program is an implementation of Andreev's statistico-combinatorial model for unsupervised learning of morphology, mainly described in:

Andreev, N. D. (1965). Statistiko-kombinatornoe modelirovanie jazykov. Nauka.
Andreev, N. D. (1967). Statistiko-kombinatornye metody v teoretičeskom i prikladnom jazykovedenii. Nauka.

If you don't know Russian and want to get the idea fast, here is my paper documenting this implementation: Unsupervized learning of morphology in the USSR.

It takes as input a corpus that is tokenized and has one sentence per line, and returns a set of classes, each containing a set of affixes that are associated to a set of stems.

Ex: Class 1

  • Affixes: ies, y
  • Stems: all, authorit, abilit, ...

USAGE:

stat_comb_model.py -i corpus.txt

About

Andreev's statistico-combinatorial model for morphology learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages