Naeval — comparing quality and performance of NLP systems for Russian language. Naeval is used to evaluate project Natasha components: Razdel, Navec, Slovnet.
Naeval supports Python 3.7+
Materials are in Russian:
See Razdel evalualtion section for more info.
|
corpora |
syntag |
gicrya |
rnc |
|
errors |
time |
errors |
time |
errors |
time |
errors |
time |
re.findall(\w+|\d+|\p+) |
24 |
0.5 |
16 |
0.5 |
19 |
0.4 |
60 |
0.4 |
spacy |
26 |
6.2 |
13 |
5.8 |
14 |
4.1 |
32 |
3.9 |
nltk.word_tokenize |
60 |
3.4 |
256 |
3.3 |
75 |
2.7 |
199 |
2.9 |
mystem |
23 |
5.0 |
15 |
4.7 |
19 |
3.7 |
14 |
3.9 |
mosestokenizer |
11 |
2.1 |
8 |
1.9 |
15 |
1.6 |
16 |
1.7 |
segtok.word_tokenize |
16 |
2.3 |
8 |
2.3 |
14 |
1.8 |
9 |
1.8 |
aatimofeev/spacy_russian_tokenizer |
17 |
48.7 |
4 |
51.1 |
5 |
39.5 |
20 |
52.2 |
koziev/rutokenizer |
15 |
1.1 |
8 |
1.0 |
23 |
0.8 |
68 |
0.9 |
razdel.tokenize |
9 |
2.9 |
9 |
2.8 |
3 |
2.0 |
16 |
2.2 |
|
corpora |
syntag |
gicrya |
rnc |
|
errors |
time |
errors |
time |
errors |
time |
errors |
time |
re.split([.?!…]) |
114 |
0.9 |
53 |
0.6 |
63 |
0.7 |
130 |
1.0 |
segtok.split_single |
106 |
17.8 |
36 |
13.4 |
1001 |
1.1 |
912 |
2.8 |
mosestokenizer |
238 |
8.9 |
182 |
5.7 |
80 |
6.4 |
287 |
7.4 |
nltk.sent_tokenize |
92 |
10.1 |
36 |
5.3 |
44 |
5.6 |
183 |
8.9 |
deeppavlov/rusenttokenize |
57 |
10.9 |
10 |
7.9 |
56 |
6.8 |
119 |
7.0 |
razdel.sentenize |
52 |
6.1 |
7 |
3.9 |
72 |
4.5 |
59 |
7.5 |
See Navec evalualtion section for more info.
|
type |
init, s |
get, µs |
disk, mb |
ram, mb |
vocab |
hudlit_12B_500K_300d_100q |
navec |
1.1 |
21.6 |
50.6 |
95.3 |
500K |
news_1B_250K_300d_100q |
navec |
0.8 |
20.7 |
25.4 |
47.7 |
250K |
ruscorpora_upos_cbow_300_20_2019 |
w2v |
3.3 |
1.4 |
220.6 |
236.1 |
189K |
ruwikiruscorpora_upos_skipgram_300_2_2019 |
w2v |
5.0 |
1.5 |
290.0 |
309.4 |
248K |
tayga_upos_skipgram_300_2_2019 |
w2v |
5.2 |
1.4 |
290.7 |
310.9 |
249K |
tayga_none_fasttextcbow_300_10_2019 |
fasttext |
8.0 |
13.4 |
2741.9 |
2746.9 |
192K |
araneum_none_fasttextcbow_300_5_2018 |
fasttext |
16.4 |
10.6 |
2752.1 |
2754.7 |
195K |
|
type |
simlex |
hj |
rt |
ae |
ae2 |
lrwc |
hudlit_12B_500K_300d_100q |
navec |
0.310 |
0.707 |
0.842 |
0.931 |
0.923 |
0.604 |
news_1B_250K_300d_100q |
navec |
0.230 |
0.590 |
0.784 |
0.866 |
0.861 |
0.589 |
ruscorpora_upos_cbow_300_20_2019 |
w2v |
0.359 |
0.685 |
0.852 |
0.758 |
0.896 |
0.602 |
ruwikiruscorpora_upos_skipgram_300_2_2019 |
w2v |
0.321 |
0.723 |
0.817 |
0.801 |
0.860 |
0.629 |
tayga_upos_skipgram_300_2_2019 |
w2v |
0.429 |
0.749 |
0.871 |
0.771 |
0.899 |
0.639 |
tayga_none_fasttextcbow_300_10_2019 |
fasttext |
0.369 |
0.639 |
0.793 |
0.682 |
0.813 |
0.536 |
araneum_none_fasttextcbow_300_5_2018 |
fasttext |
0.349 |
0.671 |
0.801 |
0.706 |
0.793 |
0.579 |
See Slovnet evaluation section for more info.
|
news |
wiki |
fiction |
social |
poetry |
slovnet |
0.961 |
0.815 |
0.905 |
0.807 |
0.664 |
slovnet_bert |
0.982 |
0.884 |
0.990 |
0.890 |
0.856 |
deeppavlov |
0.940 |
0.841 |
0.944 |
0.870 |
0.857 |
deeppavlov_bert |
0.951 |
0.868 |
0.964 |
0.892 |
0.865 |
udpipe |
0.918 |
0.811 |
0.957 |
0.870 |
0.776 |
spacy |
0.964 |
0.849 |
0.942 |
0.857 |
0.784 |
stanza |
0.934 |
0.831 |
0.940 |
0.873 |
0.825 |
rnnmorph |
0.896 |
0.812 |
0.890 |
0.860 |
0.838 |
maru |
0.894 |
0.808 |
0.887 |
0.861 |
0.840 |
rupostagger |
0.673 |
0.645 |
0.661 |
0.641 |
0.636 |
|
init, s |
disk, mb |
ram, mb |
speed, it/s |
slovnet |
1.0 |
27 |
115 |
532.0 |
slovnet_bert |
5.0 |
475 |
8087 |
285.0 (gpu) |
deeppavlov |
4.0 |
32 |
10240 |
90.0 (gpu) |
deeppavlov_bert |
20.0 |
1393 |
8704 |
85.0 (gpu) |
udpipe |
6.9 |
45 |
242 |
56.2 |
spacy |
8.0 |
140 |
579 |
50.0 |
stanza |
2.0 |
591 |
393 |
92.0 |
rnnmorph |
8.7 |
10 |
289 |
16.6 |
maru |
15.8 |
44 |
370 |
36.4 |
rupostagger |
4.8 |
3 |
118 |
48.0 |
|
news |
wiki |
fiction |
social |
poetry |
|
uas |
las |
uas |
las |
uas |
las |
uas |
las |
uas |
las |
slovnet |
0.907 |
0.880 |
0.775 |
0.718 |
0.806 |
0.776 |
0.726 |
0.656 |
0.542 |
0.469 |
slovnet_bert |
0.965 |
0.936 |
0.891 |
0.828 |
0.958 |
0.940 |
0.846 |
0.782 |
0.776 |
0.706 |
deeppavlov_bert |
0.962 |
0.910 |
0.882 |
0.786 |
0.963 |
0.929 |
0.844 |
0.761 |
0.784 |
0.691 |
udpipe |
0.873 |
0.823 |
0.622 |
0.531 |
0.910 |
0.876 |
0.700 |
0.624 |
0.625 |
0.534 |
spacy |
0.943 |
0.916 |
0.851 |
0.783 |
0.901 |
0.874 |
0.804 |
0.737 |
0.704 |
0.616 |
stanza |
0.940 |
0.886 |
0.815 |
0.716 |
0.936 |
0.895 |
0.802 |
0.714 |
0.713 |
0.613 |
|
init, s |
disk, mb |
ram, mb |
speed, it/s |
slovnet |
1.0 |
27 |
125 |
450.0 |
slovnet_bert |
5.0 |
504 |
3427 |
200.0 (gpu) |
deeppavlov_bert |
34.0 |
1427 |
8704 |
75.0 (gpu) |
udpipe |
6.9 |
45 |
242 |
56.2 |
spacy |
9.0 |
140 |
579 |
41.0 |
stanza |
3.0 |
591 |
890 |
12.0 |
See Slovnet evalualtion section for more info.
|
factru |
gareev |
ne5 |
bsnlp |
f1 |
PER |
LOC |
ORG |
PER |
ORG |
PER |
LOC |
ORG |
PER |
LOC |
ORG |
slovnet |
0.959 |
0.915 |
0.825 |
0.977 |
0.899 |
0.984 |
0.973 |
0.951 |
0.944 |
0.834 |
0.718 |
slovnet_bert |
0.973 |
0.928 |
0.831 |
0.991 |
0.911 |
0.996 |
0.989 |
0.976 |
0.960 |
0.838 |
0.733 |
deeppavlov |
0.910 |
0.886 |
0.742 |
0.944 |
0.798 |
0.942 |
0.919 |
0.881 |
0.866 |
0.767 |
0.624 |
deeppavlov_bert |
0.971 |
0.928 |
0.825 |
0.980 |
0.916 |
0.997 |
0.990 |
0.976 |
0.954 |
0.840 |
0.741 |
deeppavlov_slavic |
0.956 |
0.884 |
0.714 |
0.976 |
0.776 |
0.984 |
0.817 |
0.761 |
0.965 |
0.925 |
0.831 |
pullenti |
0.905 |
0.814 |
0.686 |
0.939 |
0.639 |
0.952 |
0.862 |
0.683 |
0.900 |
0.769 |
0.566 |
spacy |
0.901 |
0.886 |
0.765 |
0.970 |
0.883 |
0.967 |
0.928 |
0.918 |
0.919 |
0.823 |
0.693 |
stanza |
0.943 |
0.865 |
0.687 |
0.953 |
0.827 |
0.923 |
0.753 |
0.734 |
0.938 |
0.838 |
0.724 |
texterra |
0.900 |
0.800 |
0.597 |
0.888 |
0.561 |
0.901 |
0.777 |
0.594 |
0.858 |
0.783 |
0.548 |
tomita |
0.929 |
|
|
0.921 |
|
0.945 |
|
|
0.881 |
|
|
mitie |
0.888 |
0.861 |
0.532 |
0.849 |
0.452 |
0.753 |
0.642 |
0.432 |
0.736 |
0.801 |
0.524 |
|
init, s |
disk, mb |
ram, mb |
speed, it/s |
slovnet |
1.0 |
27 |
205 |
25.3 |
slovnet_bert |
5.0 |
473 |
9500 |
40.0 (gpu) |
deeppavlov |
5.9 |
1024 |
3072 |
24.3 (gpu) |
deeppavlov_bert |
34.5 |
2048 |
6144 |
13.1 (gpu) |
deeppavlov_slavic |
35.0 |
2048 |
4096 |
8.0 (gpu) |
pullenti |
2.9 |
16 |
253 |
6.0 |
spacy |
8.0 |
140 |
625 |
8.0 |
stanza |
3.0 |
591 |
11264 |
3.0 (gpu) |
texterra |
47.6 |
193 |
3379 |
4.0 |
tomita |
2.0 |
64 |
63 |
29.8 |
mitie |
28.3 |
327 |
261 |
32.8 |
Dev env
python -m venv ~/.venvs/natasha-naeval
source ~/.venvs/natasha-naeval/bin/activate
pip install -r requirements/dev.txt
pip install -e .
python -m ipykernel install --user --name natasha-naeval
Lint