Skip to content

Commit

Permalink
Version 0.3.0
Browse files Browse the repository at this point in the history
  • Loading branch information
aajanki committed May 17, 2020
1 parent 5260cfe commit 75a8f28
Show file tree
Hide file tree
Showing 3 changed files with 54 additions and 18 deletions.
6 changes: 6 additions & 0 deletions Changelog
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
Version 0.3.0, 2020-05-17

* Extract noun phrases
* Lemmatize conjugated abbreviations: EU:ssa => EU
* Requires SpaCy 2.2.4 or later

Version 0.2.0, 2020-01-26

* Tagging auxiliary verbs as AUX (previously VERB) following the UD convention
Expand Down
27 changes: 9 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
# Experimental Finnish language model for SpaCy

Finnish language model for [SpaCy](https://spacy.io/). The model contains POS tagger, dependency parser, word vectors, token frequencies and a lemmatizer (libvoikko). See below for notes about NER.
Finnish language model for [SpaCy](https://spacy.io/). The model contains POS tagger, dependency parser, word vectors, noun phrase extraction, token frequencies and a lemmatizer (libvoikko). See below for notes about NER.

## Install the Finnish language model

First, install [the libvoikko native library with Finnish morphology data files](https://voikko.puimula.org/python.html).

Next, install the model by running:
```
pip install https://github.com/aajanki/spacy-fi/releases/download/v0.2.0/fi_experimental_web_md-0.2.0-py3-none-any.whl
pip install https://github.com/aajanki/spacy-fi/releases/download/v0.3.0/fi_experimental_web_md-0.3.0-py3-none-any.whl
```

## Usage
Expand All @@ -29,7 +29,7 @@ for t in doc:

Install [the libvoikko native library with Finnish morphology data files](https://voikko.puimula.org/python.html).

```
```sh
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
Expand All @@ -39,7 +39,7 @@ tools/download_data.sh

### Train the model

```
```sh
tools/train.sh
```

Expand All @@ -62,26 +62,17 @@ nlp = spacy.load('models/merged')
doc = nlp('Hän ajoi punaisella autolla.')
for t in doc:
print(f'{t.lemma_}\t{t.pos_}')
```

### Build a Python package

Package just the POS tagger and dependency parser (this is the model published on GitHub):

```
tools/package_model.sh models/taggerparser/model-best
```

Alternatively, to build a model with combined tagger, parser and NER capabilities, run the following:
### Notes about the NER model

```
tools/package_model.sh models/merged
```

Notes about the NER model:
* The model is trained on a very specific domain (technology news) and its out-of-domain generalization is quite poor.
* Distributing the NER model might not be possible because the training data license (CC BY-ND-NC) is incompatible with the lemmatizer license (GPL).

### Packaging and publishing

See [packaging.md].

## License

All the content in this repository is available under the [GNU General Public License, version 3 or any later version](LICENSE).
Expand Down
39 changes: 39 additions & 0 deletions packaging.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Packaging

Package just the POS tagger and dependency parser (this is the model published on GitHub).

Remember to change the version below!

```sh
tools/package_model.sh models/taggerparser/model-best <<EOF
fi
experimental_web_md
0.3.0
Finnish language model: POS tagger, dependency parser, lemmatizer
Antti Ajanki
antti.ajanki@iki.fi
https://github.com/aajanki/spacy-fi
GPL v3.0
EOF
```

Alternatively, to build a model with combined tagger, parser and NER capabilities, run the following:

```sh
tools/package_model.sh models/merged
```

## Publishing

```sh
git tag v0.3.0
git push --tags
```

Create a new release at
[https://github.com/aajanki/spacy-fi/releases]. Upload
models/python-package/fi_experimental_web_md-*/dist/*.whl to the
release.

Update the pip install link on the [README](README.md).

0 comments on commit 75a8f28

Please sign in to comment.