GitHub - malaysia-ai/MeloTTS-MS: High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese, Korean and Malay.

Original README at https://github.com/myshell-ai/MeloTTS

Introduction

MeloTTS MS is a forked of https://github.com/myshell-ai/MeloTTS to support Malay language, models and checkpoints with optimizer states released at https://huggingface.co/malaysia-ai/MeloTTS-MS

Improvement

Use ms phonemizer and Malaya Speech normalizer, melo/text/malay.py,

text = 'hello nama saya.'
text = text_normalize(text)
phones, tones, word2ph = g2p(text)
"""
(['_',
'h',
'ˈɛ',
'l',
'o',
'n',
'ˈa',
'm',
'ə',
's',
'ˈa',
'j',
'ə',
'.',
'_'],
[0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0],
[1, 4, 4, 4, 1, 1])
"""

Use Pretrained Malaysian BERT, melo/text/malay_bert.py.
Extend symbols, melo/text/symbols.py.
Hardcode the size of vocab and tone based on pretrained but use the new size during inference, melo/models.py,

self.enc_p = TextEncoder(
    n_vocab if is_eval else 219,
    inter_channels,
    hidden_channels,
    filter_channels,
    n_heads,
    n_layers,
    kernel_size,
    p_dropout,
    gin_channels=self.enc_gin_channels,
    num_languages=num_languages,
    num_tones=num_tones if is_eval else 16,
)

Use the official pretrained models after that extend the embedding size, melo/train.py,

utils.load_checkpoint(
  hps.pretrain_G,
  net_g,
  None,
  skip_optimizer=True
)

old_embeddings = net_g.module.enc_p.emb
net_g.module.enc_p.emb = net_g.module.get_resized_embeddings(old_embeddings, len(symbols))

old_embeddings = net_g.module.enc_p.tone_emb
net_g.module.enc_p.tone_emb = net_g.module.get_resized_embeddings(old_embeddings, 18)

print(net_g.module.enc_p.emb.weight.shape, net_g.module.enc_p.tone_emb.weight.shape)

Usage

The Python API and model cards can be found in this repo or on HuggingFace.

Contributing

If you find this work useful, please consider contributing to this repo.

Many thanks to @fakerybakery for adding the Web UI and CLI part.

Authors

Wenliang Zhao at Tsinghua University
Xumin Yu at Tsinghua University
Zengyi Qin (project lead) at MIT and MyShell

Citation

@software{zhao2024melo,
  author={Zhao, Wenliang and Yu, Xumin and Qin, Zengyi},
  title = {MeloTTS: High-quality Multi-lingual Multi-accent Text-to-Speech},
  url = {https://github.com/myshell-ai/MeloTTS},
  year = {2023}
}

License

This library is under MIT License, which means it is free for both commercial and non-commercial use.

Acknowledgements

This implementation is based on TTS, VITS, VITS2 and Bert-VITS2. We appreciate their awesome work.

Name		Name	Last commit message	Last commit date
Latest commit History 113 Commits
.github/workflows		.github/workflows
docs		docs
melo		melo
test		test
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
inference.ipynb		inference.ipynb
logo.png		logo.png
requirements.txt		requirements.txt
setup.py		setup.py
verify-model.ipynb		verify-model.ipynb
verify-ms-g2p.ipynb		verify-ms-g2p.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Improvement

Usage

Authors

License

Acknowledgements

About

Releases

Packages

Languages

License

malaysia-ai/MeloTTS-MS

Folders and files

Latest commit

History

Repository files navigation

Introduction

Improvement

Usage

Authors

License

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages