Beanpiece: A Java binding to Google SentencePiece
SentencePiece is an unsupervised text tokenizer and detokenizer, developed by Google. Beanpiece provides a Java API to SentencePiece.
As of version 0.2, this library provides API compatibility to commit 1ff5904(Apr 1, 2018).
The following tools are required to build Beanpiece:
- sbt
- g++ compiler, which supports c++ 11.
To build the project, just give:
sbt package
It will take all the tasks needed, from copying shared libraries from compiling, packaging the Java source code.
As of version 0.2, the project only contains libsentencepiece.so
for Linux (amd64) only. Because of that, the built jar will not run on osx or windows - they will be added at 0.3.
Until then, please build the sentencepiece shared library by yourself and copy them into:
- windows:
/library/windows/[i386|amd64|ppc]
- osx:
/library/windows/[i386|amd64|ppc]
After then, you can build the project as described above.