Releases: aajanki/spacy-fi
Release 0.15.1
Release 0.15.0
- Compatible with spaCy 3.8
- Improved spam filter on the MC4 corpus
Evaluation scores:
TAG 96.88
POS 96.86
MORPH 92.60
LEMMA 94.10
UAS 87.62
LAS 83.52
NER P 82.55
NER R 81.53
NER F 82.04
Release 0.14.0
- Compatible with spaCy 3.7
- The noun chunker includes chains of flats and nmods: e.g. "maaliskuun 7. päivänä"
- The parser doesn't try to detect nsubj:outer, dislocated and goeswith
dependencies anymore. There's not enough training data to learn those. - Tokenize "-kampanja" as ["-", "kampanja"]
- Tokenize "maa-" as ["maa", "-"]
- Tokenize "/kk" as ["/", "kk"]
- Other tokenizer improvements
Evaluation scores:
TAG 96.62
POS 96.45
MORPH 92.26
LEMMA 94.01
UAS 87.14
LAS 82.90
NER P 83.04
NER R 81.56
NER F 82.29
Release 0.13.0
- Compatible with spaCy 3.6
Evaluation scores:
TAG 96.81
POS 96.79
MORPH 92.49
LEMMA 94.16
UAS 88.55
LAS 84.18
NER P 82.85
NER R 81.80
NER F 82.32
Release 0.12.0
- Compatible with spaCy 3.5
- Word occurrence probabilities (they have been broken in the past several versions)
Evaluation scores:
TAG 96.72
POS 96.69
MORPH 92.75
LEMMA 94.19
UAS 87.28
LAS 83.21
NER P 83.00
NER R 81.41
NER F 82.20
Release 0.11.0
- Ported to spaCy 3.4
- Updated word vectors and word frequencies
- Minor fixes to the lemmatization
Evaluation scores:
TAG 96.71
POS 96.85
MORPH 92.83
LEMMA 94.22
UAS 87.38
LAS 83.02
NER P 82.95
NER R 81.49
NER F 82.21
Release 0.10.0
- Floret embedding vectors trained on MC4_fi_cleaned
- Ported to spaCy 3.3.0. Older spacy versions are not supported anymore.
Evaluation scores:
TAG 96.95
POS 96.83
MORPH 92.39
LEMMA 93.85
UAS 88.12
LAS 83.94
NER P 82.71
NER R 81.12
NER F 81.91
Release 0.10.0b1
- Ported to spaCy 3.3.0.dev0. Older spacy versions are not supported anymore.
- Noun chunker now splits off appositions as independent phrases
Release 0.9.0
- The pipeline now includes a named-entity recognizer (NER)
Evaluation scores:
TAG 96.75
POS 96.32
MORPH 92.31
LEMMA 93.82
UAS 87.69
LAS 83.38
NER P 82.32
NER R 80.53
NER F 81.41
Release 0.8.0
- Ported to spaCy 3.2. Older spaCy versions are not supported anymore.
- Vectors for out-of-vocabulary words generated by Floret embeddings
- The default spaCy morphologizer instead of the custom Voikko-based morphologizer
Evaluation scores:
TAG 96.93
POS 96.48
MORPH 92.46
LEMMA 93.84
UAS 87.60
LAS 83.33