Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a bit of silence between batches of phonemes #84

Open
j-venerable opened this issue Feb 3, 2025 · 2 comments
Open

Add a bit of silence between batches of phonemes #84

j-venerable opened this issue Feb 3, 2025 · 2 comments
Assignees
Labels
feature Further information is requested

Comments

@j-venerable
Copy link

Describe the feature

Removing all silence between batches of phonemes results in an unnatural speech flow. I'm talking about this code specifically:

audio_part, _ = librosa.effects.trim(audio_part)

As an experiment I tried to add one second of silence between every batch and it sounded much better IMO. I can make a pull request if you want, adding an extra argument to the create() method so users can choose the duration of silence they want between batches.

@j-venerable j-venerable added the feature Further information is requested label Feb 3, 2025
@thewh1teagle
Copy link
Owner

Removing all silence between batches of phonemes results in an unnatural speech flow.

Can you show example sentence?

@j-venerable
Copy link
Author

In my opinion, this is an issue with every long text I have tried. It was more pronounced before upgrading to Kokoro 1.0, so I have adjusted the pause to 0.8s. 0.5s or 0.6s might be a better default. Here is the beginning of Moby Dick:

text = "Some years ago — never mind how long precisely — having little or no money in my purse, and nothing particular to interest me on shore, " \
       "I thought I would sail about a little and see the watery part of the world. It is a way I have of driving off the spleen and regulating the circulation. " \
       "Whenever I find myself growing grim about the mouth; whenever it is a damp, drizzly November in my soul; " \
       "whenever I find myself involuntarily pausing before coffin warehouses, and bringing up the rear of every funeral I meet; " \
       "and especially whenever my hypos get such an upper hand of me, that it requires a strong moral principle to prevent me from deliberately stepping into the street, " \
       "and methodically knocking people’s hats off—then, I account it high time to get to sea as soon as I can. " \
       "This is my substitute for pistol and ball. With a philosophical flourish Cato throws himself upon his sword; " \
       "I quietly take to the ship. There is nothing surprising in this. If they but knew it, almost all men in their degree, some time or other, " \
       "cherish very nearly the same feelings towards the ocean with me."

audio, _ = KOKORO.create(text, voice="af_sarah", speed=0.84, lang="en-us")

Notice the cut after "whenever I find myself involuntarily pausing before coffin warehouses," and "I quietly take to the ship."

Without silence between batches:

mobydick-no-silence.webm

With 0.8 seconds of silence between batches:

mobydick-silence.webm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants