Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some Chinese text pronunciations are weird. #101

Open
lyris opened this issue Feb 10, 2025 · 5 comments
Open

Some Chinese text pronunciations are weird. #101

lyris opened this issue Feb 10, 2025 · 5 comments
Labels
bug Something isn't working

Comments

@lyris
Copy link

lyris commented Feb 10, 2025

What happened?

[A bug happened!]
the .onnx version has unclear pronunciation of some words such as "质量", but the .pt version has no problem.
text: '图的质量总体来说比之前好,会有些图的质量非常高,但不稳定,并且画风会变来变去。'
phonemes: 'tʰu↗ tɤ ꭧɨ↘lja↘ŋ ʦʊ↓ŋtʰi↓ lai↗ʂwo→ pi↓ ꭧɨ→ʨʰjɛ↗n xau↓, xwei↘ jou↓ɕje→ tʰu↗ tɤ ꭧɨ↘lja↘ŋ fei→ꭧʰa↗ŋ kau→, ta↘n pu↘ wə↓nti↘ŋ. pi↘ŋʨʰje↓ xwa↘fə→ŋ xwei↘ pjɛ↘nlai↗pjɛ↘nʨʰy↘.'

example audios:
.onnx
.pt

Steps to reproduce

The latest onnx version kokoro-onnx 0.4.2

import soundfile as sf
from kokoro_onnx import Kokoro
from misaki.zh import ZHG2P

kokoro = Kokoro("kokoro-v1.0.onnx", "voices-v1.0.bin")
text = '图的质量总体来说比之前好,会有些图的质量非常高,但不稳定,并且画风会变来变去。'
lang = 'cmn'
g2p = ZHG2P()
for voice in kokoro.get_voices():
    if voice.startswith('z'):
        phonemes = g2p(text)
        print(f'{voice} speak {text} {phonemes}')
        samples, sample_rate = kokoro.create(phonemes, voice=voice, lang=lang, is_phonemes=True, trim=False)
        sf.write(f"output/onnx_version_{voice}.wav", samples, sample_rate)

native kokoro==0.7.12 misaki[zh]==0.7.12 (https://github.com/hexgrad/kokoro) has no problem:

import soundfile as sf
from kokoro import KPipeline

pipeline = KPipeline(lang_code='z', device='cpu')
text = '图的质量总体来说比之前好,会有些图的质量非常高,但不稳定,并且画风会变来变去。'
zh_voices = ['zf_xiaobei', 'zf_xiaoni', 'zf_xiaoxiao', 'zf_xiaoyi',
             'zm_yunjian', 'zm_yunxi', 'zm_yunxia', 'zm_yunyang']
for voice in zh_voices:
    for graphemes, phonemes, audio in pipeline(text, voice=voice):
        samples = audio.shape[0] if audio is not None else 0
        assert samples > 0, "No audio generated"
        print(f'{voice} speak {text} {phonemes}')
        sf.write(f'output/pt_version_{voice}.wav', audio, 24000)

example audio files are attatched above

### What OS are you seeing the problem on?

Window

### Package version

0.4.2

### Relevant log output

```shell

@lyris lyris added the bug Something isn't working label Feb 10, 2025
@thewh1teagle
Copy link
Owner

Try language.py example

@lyris
Copy link
Author

lyris commented Feb 11, 2025

The English has never been a problem, but there are issues with the Chinese in the example above. I modified it based on language.py, which uses an English example. The bug I reported is the Chinese example.

@fastfading
Copy link

same here on m1 mac

@fastfading
Copy link

@lyris
from deepseek

Image Image

I'm not the expert , hope it can help you

if you fix it , could you send me the fix , thanks

@fastfading
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants