-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Not all betacode signs and combinations implemented. #14
Comments
After running a character statistics on my full corpora after betacoding and normalizing ("NFD") it I got these statistics:
! 0x21 EXCLAMATION MARK 197 0x23 NUMBER SIGN 112$ 0x24 DOLLAR SIGN 1
It seems that j, J, v, V, ?, &, # could have better support, there are lots of them not coded, well done but perfect needed. |
Thanks for the issue. I'm not surprised that there are some combinations missing, as it is hard to get an exhaustive list. Let me take a look and try to resolve some of these. Completely agree that perfection is needed here! Here are what I see as initial issues from your comment:
Just to be clear there's no casing distinction in this library for input. So J and j are treated identically (there's no '*j' or 'J'), and by the same token there's no 'V'. There may be more issues but these are easy to start with. Some of the ones I don't see any immediate issues with but will have to investigate (or more examples would be helpful):
If you are looking to convert so much real text we'll probably also need some more back and forth on this if you want high quality. Please let me know if you'd like to spin up an email or gitter chat to make this easier. |
Lets make things easier by spinning up a chat, more people could be involved over time perhaps.
…On 9 October 2022 at 22:47:36 +02:00, Matias Grioni ***@***.***> wrote:
Thanks for the issue.
I'm not surprised that there are some combinations missing, as it is hard to get an exhaustive list. Let me take a look and try to resolve some of these.
Completely agree that perfection is needed here!
Here are what I see as initial issues from your comment:
* j is are completely unsupported right now. Support should be easy to add.
* 'v' and '*v' are also completely unsupported.
* There's no support for '?'. I'll have to add that in too. It's a combining character so just more work to look up all the characters it can combine with legally.
* No support for '#' characters.
* No '%' support. These are apparently escape characters.
Just to be clear there's no casing distinction in this library for input. So J and j are treated identically (there's no '*j' or 'J'), and by the same token there's no 'V'.
There may be more issues but these are easy to start with.
Some of the ones I don't see any immediate issues with but will have to investigate (or more examples would be helpful):
* '&' has some support. Maybe I'm missing some macron combinations.
* '!' is weird to see in the output.
* There's a lot of parens in the output, that seems fishy.
If you are looking to convert so much real text we'll probably also need some more back and forth on this if you want high quality. Please let me know if you'd like to spin up an email or gitter chat to make this easier.
—
Reply to this email directly, view it on GitHub <#14 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AJVAJZUO6MPGPYAXFLAIJHLWCMVORANCNFSM6AAAAAARAKTVE4>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Sure. You can join the betacode room here and we can discuss there. |
Hi, I just recently wrote 0.1 of perseus-converter, using my own developed converter I successfully exported the whole Perseus Digital Library to utf-8 normalized and decomposed text files.
I recognized that not all betacode is properly restored, please look at https://github.com/kristoffer-paulsson/koine-corpora/blob/main/koine/_elegy-and-iambus-volume-ii.txt on rows 4, 9, 14, 176 and 177 for an example. Could you please consider reimplement the missing combinations that may be missing.
Maybe there are also missing implementations described in https://en.wikipedia.org/wiki/Beta_Code
The text was updated successfully, but these errors were encountered: