You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A single apostrophe indicates the Chinese simplified form of the radical (for example, U+9F7F 齿 for U+9F52 齒) and two apostrophes indicate the non-Chinese simplified form of the radical (for example, U+6B6F 歯 for U+9F52 齒).
The ProcessCjkRadicalsFile method handles the single apostrophe case, but throws on the two apostrophe case at
Note also that the non-Chinese simplified form of the radical can have an empty CJK radical character if the CJK radical character is not included in the Kangxi Radicals block or the CJK Radicals Supplement block, so the following would also need to handle an empty character
Oh, that's great, another breaking update to the database 😅
From what I understand, what they call "non-Chinese" are actually japanese characters. (The one they give as example is the japanese kanji for tooth: 歯)
Before updating this, I'll do a quick sanity check that there is no weird stuff going here, but the best solution would be to have "Chinese Simplified" and "Japanese Simplified" properties. (AFAIK, PRC and Japan are the only two countries having applied an official simplification process of the chinese characters, so hopefully there won't be an exception)
So, I checked, and…
For radical 182, I'm not sure where it comes from 🙁
For radical 208, it is indeed a Japanese kanji, but a lesser used variant. (And also not a radical? Traditional one is still the official radical)
Others seem to be ok.
I don't really know what to make out of it. It would seem that when the radical field is empty it means that the character is an alternate (simplified) writing and not a proper radical, but that's a weird way to reference words here… 🤔
CJKRadicals-15.1.0.txt uses apostrophes after the radical number to indicate that the ideograph uses a standard simplification. From Unicode® Standard Annex #38 UNICODE HAN DATABASE (UNIHAN):
The
ProcessCjkRadicalsFile
method handles the single apostrophe case, but throws on the two apostrophe case atNetUnicodeInfo/System.Unicode.Build.Core/UnicodeDataProcessor.cs
Line 246 in 16ae6bc
Note also that the non-Chinese simplified form of the radical can have an empty CJK radical character if the CJK radical character is not included in the Kangxi Radicals block or the CJK Radicals Supplement block, so the following would also need to handle an empty character
NetUnicodeInfo/System.Unicode.Build.Core/UnicodeDataProcessor.cs
Line 251 in 16ae6bc
I'd be happy to add support for the non-Chinese simplified form. How would you prefer to represent an empty character on
CjkRadicalData
- aschar?
The text was updated successfully, but these errors were encountered: