From f6ff4cff0a5ec7e2ef3ecbe6a5185ed9435b1b13 Mon Sep 17 00:00:00 2001 From: haydenwong7bm <51369959+haydenwong7bm@users.noreply.github.com> Date: Sat, 4 Feb 2023 22:02:01 +0800 Subject: [PATCH] Split to two languages --- README.md | 3 ++- README_en.md | 44 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 46 insertions(+), 1 deletion(-) create mode 100644 README_en.md diff --git a/README.md b/README.md index d9f7888..8a89e77 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,5 @@ - +* [For English, please click here.](https://github.com/haydenwong7bm/inherited-glyphs-converter/blob/master/README_en.md) + # 傳承字形轉換器 轉換中文文字至[傳承字形](https://zh.wikipedia.org/wiki/%E8%88%8A%E5%AD%97%E5%BD%A2)(大致根據[《傳承字形檢校表》](https://github.com/ichitenfont/inheritedglyphs)標準),消除[新字形](https://zh.wikipedia.org/wiki/%E6%96%B0%E5%AD%97%E5%BD%A2)、[香港](https://zh.wikipedia.org/wiki/%E5%B8%B8%E7%94%A8%E5%AD%97%E5%AD%97%E5%BD%A2%E8%A1%A8)及[臺灣](https://zh.wikipedia.org/wiki/%E5%9C%8B%E5%AD%97%E6%A8%99%E6%BA%96%E5%AD%97%E9%AB%94)標準異體字,如該異體字於Unicode[分開編碼](https://zh.wikipedia.org/wiki/%E4%B8%AD%E6%97%A5%E9%9F%93%E7%B5%B1%E4%B8%80%E8%A1%A8%E6%84%8F%E6%96%87%E5%AD%97#%E8%AA%8D%E5%90%8C%E5%8E%9F%E5%89%87%E8%88%87%E5%8E%9F%E5%AD%97%E9%9B%86%E5%88%86%E9%9B%A2%E5%8E%9F%E5%89%87)。 diff --git a/README_en.md b/README_en.md new file mode 100644 index 0000000..325f822 --- /dev/null +++ b/README_en.md @@ -0,0 +1,44 @@ +* [請點擊這裏査看中文版。](https://github.com/haydenwong7bm/inherited-glyphs-converter/) + +# inherited-glyphs-converter + Convert CJK text to their [inherited glyphs](https://en.wikipedia.org/wiki/Jiu_zixing) form (mostly follows [_List of Recommended Inherited Glyph Components_](https://github.com/ichitenfont/inheritedglyphs)), eliminating the [xin zixing](https://en.wikipedia.org/wiki/Xin_zixing), [Hong Kong](https://en.wikipedia.org/wiki/List_of_Graphemes_of_Commonly-Used_Chinese_Characters) and [Taiwan](https://en.wikipedia.org/wiki/Standard_Form_of_National_Characters) standard variant if that character variant is [encoded seperately](https://en.wikipedia.org/wiki/CJK_Unified_Ideographs#CJK_Unified_Ideographs) on Unicode. + + The converter will keep [Shinjitai](https://en.wikipedia.org/wiki/Shinjitai) and [simplified Chinese characters](https://en.wikipedia.org/wiki/Simplified_Chinese_characters) as much as possible. + + ## Usage + + ### Command line + + python . + + Command line arguments: + + | **Options** | **Usage** | **Default value if `-o` not provided** | + |---|---|---| + | `-o` | Set options below if this argument is provided. | | + | `-j` | Use Japanese [compatibility ideographs](https://en.wikipedia.org/wiki/CJK_Compatibility_Ideographs). | `True` | + | `-k` | Use Korean compatibility ideographs. | `True` | + | `-t` | Use [CNS 11643 compatibility ideographs](https://en.wikipedia.org/wiki/CJK_Compatibility_Ideographs_Supplement). | `True` | + | `-s ` | If `value` is `c`: Use only [UnihanCore2020](https://www.unicode.org/L2/L2019/19388-unihan-core-2020.pdf) characters on supplementary planes
If `value` is `*`: Use all characters on supplementary planes. | `c` | + | `-i` | Convert other inherited variants (e.g. 秘 → 祕, 裡 → 裏). | `True` | + + ### Import module + The `inheritedglyphs` module provides a single function `convert()` which converts a string to their inherited glyphs form. + + Function arguments: + + | **Arguments** | **Usage** | **Default value** | + |---|---|---| + | `use_compatibility` | An iterable that contains `'j'`, `'k'`, and/or `'t'`.
`'j'`: Use Japanese [compatibility ideographs](https://en.wikipedia.org/wiki/CJK_Compatibility_Ideographs).
`'k'`: Use Korean compatibility ideographs.
`'t'`: Use [CNS 11643 compatibility ideographs](https://en.wikipedia.org/wiki/CJK_Compatibility_Ideographs_Supplement). | `'jkt'` | + | `convert_inherited` | If `True`, it will convert other inherited variants (e.g. 祕 → 祕, 裡 → 裏). | `True` | + | `use_supp` | Either be `False`, `'c'`, `'*'`.
`c`: in supplementary planes, only use [UnihanCore2020](https://www.unicode.org/L2/L2019/19388-unihan-core-2020.pdf) characters.
`'*'`: in supplementary planes, use all characters. | `'c'` | + + >>> from inheritedglyphs import * + >>> string = '教育及青年發展局是澳門特區政府社會文化司成立的公共部門。' + >>> print(convert(string)) + 敎育及靑年發展局是澳門特區政府社會文化司成立的公共部門。」 + >>> print(convert(string, use_compatibility='j')) # don't use Korean and CNS compatibility ideographs + 敎育及靑年發展局是澳門特區政府社會文化司成立的公共部門。 + >>> string = '李白(唐‧五言絶句)《靜夜思》:「床前明月光,疑是地上霜,舉頭望明月,低頭思故鄉。」' + >>> print(convert(string, convert_inherited=False)) + 李白(唐‧五言絕句)《靜夜思》:「床前明月光,疑是地上霜,擧頭望明月,低頭思故鄕。」 \ No newline at end of file