Releases: himkt/konoha
Releases · himkt/konoha
Release v4.6.1
- Added
readme
topyproject.toml
Release v4.6.0
Release v4.5.0
Release v4.4.0
documentation
- Add documentation using Sphinx. (#96)
- Very small fix on README.md (#97, thanks @sobamchan)
other
Release v4.3.0
Release v4.2.0
- Support tokenizers in AllenNLP integration. (#73)
This PR added full supports of konoha word tokenizers.
Release v4.1.0
- [beta] Add integration for
AllenNLP
. (#71)
Release v4.0.0
Support remote files (#59)
You can specify a s3 path for user_dictionary_path
, system_dictionary_path
and model_path
.
To use a remote path, you have to set AWS credentials.
For more information, please read [the documentation].(https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html))
(Konoha supports ways of environment variables and shared credentials file.)
from konoha import WordTokenizer
if __name__ == "__main__":
sentence = "首都大学東京"
word_tokenizer = WordTokenizer("mecab")
print(word_tokenizer.tokenize(sentence))
word_tokenizer = WordTokenizer("mecab", user_dictionary_path="s3://abc/xxx.dic")
print(word_tokenizer.tokenize(sentence))
word_tokenizer = WordTokenizer("mecab", system_dictionary_path="s3://abc/yyy")
print(word_tokenizer.tokenize(sentence))
word_tokenizer = WordTokenizer("sentencepiece", model_path="s3://abc/zzz.model")
print(word_tokenizer.tokenize(sentence))
Rename name of repository (#60)
tiny_tokenizer
is ambiguous. (tiny_segmenter
already exists)