Skip to content

Releases: clarinsi/classla

v2.1.1

10 Apr 11:10
Compare
Choose a tag to compare

reldi-tokeniser 1.0.3 added as dependency, in which a bug in abbreviation loading has been resolved.

v2.1

08 Aug 07:32
e3ace3b
Compare
Choose a tag to compare
  • Added new models for all languages
  • Added new "web" processing type
  • Fixed sentence splitting in the tokenizers

v2.0

16 Feb 18:41
Compare
Choose a tag to compare
  • Added new models for standard Slovenian
  • Added new inflectional lexicon for Slovenian
  • Adapted tests to new model outputs
  • Modified lexicon to store underscores instead of empty strings
  • Other changes

v1.2.0

29 Jun 11:32
Compare
Choose a tag to compare
  • Added SRL parsing to Slovenian language
  • Fixed training for lemmatizer and pos tagger
  • Added toy tests for all trainings
  • Other smaller fixes

v1.1.1

06 May 09:21
Compare
Choose a tag to compare
  • Updated external package version requirements. Mainly due to updates in Slovenian obeliks tokenizer

v1.1.0

12 Jan 09:36
Compare
Choose a tag to compare
  • Added tokenizer pretag option for both obeliks and reldi-tokeniser (via pos_lemma_pretag)
  • Updated Slovene inflectional lexicon and moved from lemmatizer model to morphosyntactic annotation model
  • Added upos and ufeats control to Slovene inflectional lexicon
  • Other smaller fixes

v1.0.2

07 Sep 08:21
Compare
Choose a tag to compare
  • fixed issue where the parser produced non-CONLLU-compliant extension labels with underscores (e.g. cc_preconj) instead of colon-separated labels (e.g. cc:preconj)
  • during lemmatization, if a token consists of a character that is not present in the seq2seq vocabulary, lemma will now be identical to the token
  • added PUNCT control
  • fixed MISC collumn bug for NER
  • punct in Bulgarian UPOS was renamed to Z