Skip to content

Releases: himkt/konoha

Release v4.6.1

06 Aug 12:41
Compare
Choose a tag to compare
  • Added readme to pyproject.toml

Release v4.6.0

06 Aug 12:35
Compare
Choose a tag to compare

core feature

  • Support nagisa (#107)

documentation

  • Update some messages (#104)
  • Fix installation.rst (#106)

other

  • Skip tests using boto3 if AWS credentials are not found (#109)

Release v4.5.0

16 Jul 12:32
9f5695f
Compare
Choose a tag to compare

Fix bug

  • Fix dependency name and bump up version to 4.5.0. (#102)
  • Fix KyTea and Sentencepiece testings with remote feature. (#103)

Release v4.4.0

02 Jul 11:55
Compare
Choose a tag to compare

documentation

  • Add documentation using Sphinx. (#96)
  • Very small fix on README.md (#97, thanks @sobamchan)

other

  • Dissect tests and upgrade version of Ubuntu used in GitHub Actions. (#93)
  • Refactoring (#95)
  • Create docker directory (#98)
  • Add config for development using vscode. (#99)
  • Add dummy class for AllenNLP Token. (#100)
  • Update AllenNLP to v1. (#101)

Release v4.3.0

16 May 09:08
c36d96c
Compare
Choose a tag to compare

core feature

  • Add tokenization server. (#79)
  • Add type annotations. (#83)

integration

  • Support tokenizers in AllenNLP integration. (#73)

documentation

  • Update README. (#84)
  • Add reference to blog articles. (#85)
  • Update README to use shields.io. (#88)

other

  • Replace unittest with pytest. (#74)
  • Simple code-fix and modify error messages. (#86)
  • Install sudachidict_core in pip install. (#89)
  • Bump up version number to v4.3.0. (#90)

Release v4.2.0

03 May 11:21
6a6bc2f
Compare
Choose a tag to compare
  • Support tokenizers in AllenNLP integration. (#73)

This PR added full supports of konoha word tokenizers.

Release v4.1.0

03 May 08:09
cf88e68
Compare
Choose a tag to compare
  • [beta] Add integration for AllenNLP. (#71)

Release v4.0.0

15 Jan 08:51
1a53f23
Compare
Choose a tag to compare

Support remote files (#59)

You can specify a s3 path for user_dictionary_path, system_dictionary_path and model_path.
To use a remote path, you have to set AWS credentials.
For more information, please read [the documentation].(https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html))
(Konoha supports ways of environment variables and shared credentials file.)

from konoha import WordTokenizer

if __name__ == "__main__":
    sentence = "首都大学東京"

    word_tokenizer = WordTokenizer("mecab")
    print(word_tokenizer.tokenize(sentence))

    word_tokenizer = WordTokenizer("mecab", user_dictionary_path="s3://abc/xxx.dic")
    print(word_tokenizer.tokenize(sentence))

    word_tokenizer = WordTokenizer("mecab", system_dictionary_path="s3://abc/yyy")
    print(word_tokenizer.tokenize(sentence))

    word_tokenizer = WordTokenizer("sentencepiece", model_path="s3://abc/zzz.model")
    print(word_tokenizer.tokenize(sentence))

Rename name of repository (#60)

tiny_tokenizer is ambiguous. (tiny_segmenter already exists)

Release v3.0.2

24 Dec 13:08
Compare
Choose a tag to compare
  • Support system dictionary in MeCab #42
  • Support custom model in KyTea #49

Release v3.1.0

28 Dec 10:39
573901c
Compare
Choose a tag to compare
  • Use poetry for development #53
  • Support Janome, which is pure-python morphological analyzer #57