A short python script to count the number of unique kanji in a text.
The purpose of this script is to determine the number of unique kanji in a given text and return a list of these distinct kanji.
The script skips over Hiragana, Katakana, Latin alphabet, Arabic numerals and punctuation marks, so they are not a part of the count.
- Copy and paste the script to your preferred python interpreter.
- Insert your text in between the quotation marks where it says "text". Keep in mind that the text should be in a single line format. (I simply use an online converter to achieve that, such as https://lingojam.com/TexttoOneLine)
- Run the script and voila, now you know the number of unique kanji and what they are.
Note: If you want to count unique Kana or any other character as well, simply remove them from where it says "skip". Similarly, if there are characters you want to exclude from the count, just add them to the "skip" list.
- Find out what and how many unique kanji there are in an article.
- If you are keeping track of your Japanese vocabulary, find out how many unique kanji that equates to.
- Add the kanji you already know to "skip" after pasting your text to see how many kanji you don't know are in that text.