Skip to content

luckasRanarison/kaiseki

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

kaiseki

kaiseki (解析) is a japanese tokenizer and morphological analyzer using mecab-ipadic, insipired by this article.

Usage

kaiseki supports both morpheme tokenization and word tokenization (inflections included). It also provides additional informations from the mecab dictionary such as part of speech, conjugation form,...

use kaiseki::{Tokenizer, error:Error};

fn main() -> Result<(), Error> {
    let tokenizer = Tokenizer::new()?;
    let morphemes = tokenizer.tokenize("東京都に住んでいる");
    let morphemes: Vec<_> = morphemes.iter().map(|m| &m.text).collect();

    println!("{:?}", morphemes); // ["東京", "都", "に", "住ん", "で", "いる"]

    let words = tokenizer.tokenize_word("東京都に住んでいる"); 
    let words: Vec<_> = words.iter().map(|w| &w.text).collect();

    println!("{:?}", words); // ["東京", "都", "に", "住んでいる"]

    Ok(())
}

Test

cargo test

Credits

Articles

License

MIT License.

About

A japanese tokenizer and morphological analyzer

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published