Improve phonemize for multi-language support #108

HDANILO · 2025-02-16T00:18:25Z

Describe the feature

    def phonemize(self, text, lang="en-us", norm=True) -> str:
        """
        lang can be 'en-us' or 'en-gb'
        """
        if norm:
            text = Tokenizer.normalize_text(text)

        phonemes = phonemizer.phonemize(
            text, lang, preserve_punctuation=True, with_stress=True
        )

        # https://en.wiktionary.org/wiki/kokoro#English
        phonemes = phonemes.replace("kəkˈoːɹoʊ", "kˈoʊkəɹoʊ").replace(
            "kəkˈɔːɹəʊ", "kˈəʊkəɹəʊ"
        )
        phonemes = (
            phonemes.replace("ʲ", "j")
            .replace("r", "ɹ")
            .replace("x", "k")
            .replace("ɬ", "l")
        )
        phonemes = re.sub(r"(?<=[a-zɹː])(?=hˈʌndɹɪd)", " ", phonemes)
        phonemes = re.sub(r' z(?=[;:,.!?¡¿—…"«»“” ]|$)', "z", phonemes)
        if lang == "en-us":
            phonemes = re.sub(r"(?<=nˈaɪn)ti(?!ː)", "di", phonemes)
        phonemes = "".join(filter(lambda p: p in VOCAB, phonemes))
        return phonemes.strip()

phonemizer.phonemize should already encapsulate phonemes alterations for diverse languages, by injecting phonemes replacement you're binding the kokoro-onnx to english, which is a bad design choice.

I've done a simple test on my computer and I got brazillian portuguese generation to sound almost perfect just by removing all these replacements.

The text was updated successfully, but these errors were encountered:

thewh1teagle · 2025-02-16T03:58:00Z

We should remove all the replaces I didn't notice it
Yiu can create PR or I'll update in few days

HDANILO · 2025-02-16T05:14:53Z

Please have a look at the proposed design here:

#109

The idea of having specific pre-processing per language is good, and it definitely worked well with english, I think its a good idea to keep it around but also allowing other languages to have also the same possibility.

For instance, "R$ 10,10" which is "dez reais" portuguese for "ten reals", is spelled "R dolar thousand and ten" using the current version of Tokenizer, but after the split, it is read as "R Dolar ten ten", ideally, after a PortugueseTokenizer is implemented we would hear something like "dez reais e dez centavos".

If you wish to have that merged, let me know next steps

thewh1teagle · 2025-02-16T07:04:31Z

I didn't understand what's make the tokenizer spell it well (beside the pr)

thewh1teagle · 2025-02-16T07:05:15Z

Also one feature / bug and small focused per pr
I meaned only remove the replace calls

thewh1teagle · 2025-02-16T07:05:37Z

Did you see with misaki example?
Should spell well

HDANILO · 2025-02-16T07:59:31Z

Did you see with misaki example?

I havent seen misaki example, would you please link it to me?

I meaned only remove the replace calls

Removing only the replace calls doesn't do the job, that's because in the normalize text there's a bunch of pre-processing happening such that formats like " $10,000.52" can be pronnounced correctly, among others, like replacing the "," in "$10,000.52" out, making it "$10000.52" which in other languages works completely different, in portuguese the equivalent to "$10,000.52" is "R$10.000,52". So the change is more fundamental, and if we take out all the replaces from normalize_text then quality of english won't be as good.

In my PR, the Tokenizer is the version where most, if not all, replaces are removed, and the EnglishTokenizer is the version where replacements that are relevant to english are kept, this way we guarantee that there's space for specialization. The trade off is that we had to introduce a Factory to facilitate the creation of the right Tokenizer version.

HDANILO · 2025-02-16T08:10:30Z

The other option I see is really remove all replacements and pre-processing language specific and delegate that to a library that already does that, but I do not know one, removing without care now will definitely degrade english quality

thewh1teagle · 2025-02-16T09:26:50Z

https://github.com/thewh1teagle/kokoro-onnx/blob/main/examples/language.py

Try with misaki
I don't know if it support your language details there

HDANILO · 2025-02-16T19:18:25Z

I don't understand phonemes and therefore its hard for me to judge, but I've been using the PR i've sent to generate Brazillian Portuguese narration for some story telling tiktok videos, and the result has been great, better than the other alternatives I tried out there, it could be better though, if we could implement the "PortugueseTokenizer" that could pre-processor some of the text to a format that is better readable, same that is already being done for english.

But I guess thats a discussion for another feature

HDANILO · 2025-02-16T20:37:12Z

Try with misaki

Ok, spent sometime looking into misaki, it indeed doesn't support pt-br, espeak does quite well, I modified the languages.py to output a good sounding português audio:

"""
Note: on Linux you need to run this as well: apt-get install portaudio19-dev

1. Prepare virtual environment
    uv venv --seed -p 3.11
    source .venv/bin/activate

2. Install packages
    pip install kokoro-onnx sounddevice 'misaki[en]'

3. Download models
    wget https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files-v1.0/kokoro-v1.0.onnx
    wget https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files-v1.0/voices-v1.0.bin

4. Run
    python examples/language.py

Please read carefully https://huggingface.co/hexgrad/Kokoro-82M/blob/main/VOICES.md
To use other languages install misaki with the specific language. Example: pip install misaki[ko] (Korean). And change the import. Example: from misaki.ko import KOG2P
"""

import ctypes

import espeakng_loader
import phonemizer
import sounddevice as sd
from phonemizer.backend.espeak.wrapper import EspeakWrapper

from kokoro_onnx import Kokoro, log

# Check that the espeak-ng library can be loaded
try:
    ctypes.cdll.LoadLibrary(espeakng_loader.get_library_path())
except Exception as e:
    log.error(f"Failed to load espeak shared library: {e}")

EspeakWrapper.set_data_path(espeakng_loader.get_data_path())
EspeakWrapper.set_library(espeakng_loader.get_library_path())

# Kokoro
kokoro = Kokoro("kokoro-v1.0.onnx", "voices-v1.0.bin")

# Phonemize
text = "Kokoro é uma biblioteca de conversão de texto em fala."
phonemes = phonemizer.phonemize(
    text, language="pt-br", with_stress=True, backend="espeak"
)

# Create
samples, sample_rate = kokoro.create(
    phonemes, voice="pm_alex", is_phonemes=True, lang="pt-br"
)

# Play
print("Playing audio...")
sd.play(samples, sample_rate)
sd.wait()

HDANILO · 2025-02-16T20:37:57Z

Should Misaki be the one dealing with language specific text preprocessing? Perhaps that's the part I've been missing

HDANILO added the feature Further information is requested label Feb 16, 2025

HDANILO assigned thewh1teagle Feb 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve phonemize for multi-language support #108

Improve phonemize for multi-language support #108

HDANILO commented Feb 16, 2025 •

edited

Loading

thewh1teagle commented Feb 16, 2025

HDANILO commented Feb 16, 2025 •

edited

Loading

thewh1teagle commented Feb 16, 2025

thewh1teagle commented Feb 16, 2025

thewh1teagle commented Feb 16, 2025

HDANILO commented Feb 16, 2025 •

edited

Loading

HDANILO commented Feb 16, 2025

thewh1teagle commented Feb 16, 2025

HDANILO commented Feb 16, 2025 •

edited

Loading

HDANILO commented Feb 16, 2025

HDANILO commented Feb 16, 2025

Improve phonemize for multi-language support #108

Improve phonemize for multi-language support #108

Comments

HDANILO commented Feb 16, 2025 • edited Loading

Describe the feature

thewh1teagle commented Feb 16, 2025

HDANILO commented Feb 16, 2025 • edited Loading

thewh1teagle commented Feb 16, 2025

thewh1teagle commented Feb 16, 2025

thewh1teagle commented Feb 16, 2025

HDANILO commented Feb 16, 2025 • edited Loading

HDANILO commented Feb 16, 2025

thewh1teagle commented Feb 16, 2025

HDANILO commented Feb 16, 2025 • edited Loading

HDANILO commented Feb 16, 2025

HDANILO commented Feb 16, 2025

HDANILO commented Feb 16, 2025 •

edited

Loading

HDANILO commented Feb 16, 2025 •

edited

Loading

HDANILO commented Feb 16, 2025 •

edited

Loading

HDANILO commented Feb 16, 2025 •

edited

Loading