-
-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve phonemize for multi-language support #108
Comments
We should remove all the replaces I didn't notice it |
Please have a look at the proposed design here: The idea of having specific pre-processing per language is good, and it definitely worked well with english, I think its a good idea to keep it around but also allowing other languages to have also the same possibility. For instance, "R$ 10,10" which is "dez reais" portuguese for "ten reals", is spelled "R dolar thousand and ten" using the current version of Tokenizer, but after the split, it is read as "R Dolar ten ten", ideally, after a PortugueseTokenizer is implemented we would hear something like "dez reais e dez centavos". If you wish to have that merged, let me know next steps |
I didn't understand what's make the tokenizer spell it well (beside the pr) |
Also one feature / bug and small focused per pr |
Did you see with misaki example? |
I havent seen misaki example, would you please link it to me?
Removing only the replace calls doesn't do the job, that's because in the normalize text there's a bunch of pre-processing happening such that formats like " $10,000.52" can be pronnounced correctly, among others, like replacing the "," in "$10,000.52" out, making it "$10000.52" which in other languages works completely different, in portuguese the equivalent to "$10,000.52" is "R$10.000,52". So the change is more fundamental, and if we take out all the replaces from normalize_text then quality of english won't be as good. In my PR, the Tokenizer is the version where most, if not all, replaces are removed, and the EnglishTokenizer is the version where replacements that are relevant to english are kept, this way we guarantee that there's space for specialization. The trade off is that we had to introduce a Factory to facilitate the creation of the right Tokenizer version. |
The other option I see is really remove all replacements and pre-processing language specific and delegate that to a library that already does that, but I do not know one, removing without care now will definitely degrade english quality |
https://github.com/thewh1teagle/kokoro-onnx/blob/main/examples/language.py Try with misaki |
I don't understand phonemes and therefore its hard for me to judge, but I've been using the PR i've sent to generate Brazillian Portuguese narration for some story telling tiktok videos, and the result has been great, better than the other alternatives I tried out there, it could be better though, if we could implement the "PortugueseTokenizer" that could pre-processor some of the text to a format that is better readable, same that is already being done for english. But I guess thats a discussion for another feature |
Ok, spent sometime looking into misaki, it indeed doesn't support pt-br, espeak does quite well, I modified the languages.py to output a good sounding português audio:
|
Should Misaki be the one dealing with language specific text preprocessing? Perhaps that's the part I've been missing |
Describe the feature
phonemizer.phonemize
should already encapsulate phonemes alterations for diverse languages, by injecting phonemes replacement you're binding the kokoro-onnx to english, which is a bad design choice.I've done a simple test on my computer and I got brazillian portuguese generation to sound almost perfect just by removing all these replacements.
The text was updated successfully, but these errors were encountered: