v1.0.2
- fixed issue where the parser produced non-CONLLU-compliant extension labels with underscores (e.g.
cc_preconj
) instead of colon-separated labels (e.g.cc:preconj
) - during lemmatization, if a token consists of a character that is not present in the seq2seq vocabulary, lemma will now be identical to the token
- added PUNCT control
- fixed MISC collumn bug for NER
punct
in Bulgarian UPOS was renamed toZ