Skip to content

High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese, Korean and Malay.

License

Notifications You must be signed in to change notification settings

malaysia-ai/MeloTTS-MS

 
 

Repository files navigation

 

myshell-ai%2FMeloTTS | Trendshift

Original README at https://github.com/myshell-ai/MeloTTS

Introduction

MeloTTS MS is a forked of https://github.com/myshell-ai/MeloTTS to support Malay language, models and checkpoints with optimizer states released at https://huggingface.co/malaysia-ai/MeloTTS-MS

Improvement

  1. Use ms phonemizer and Malaya Speech normalizer, melo/text/malay.py,
text = 'hello nama saya.'
text = text_normalize(text)
phones, tones, word2ph = g2p(text)
"""
(['_',
'h',
'ˈɛ',
'l',
'o',
'n',
'ˈa',
'm',
'ə',
's',
'ˈa',
'j',
'ə',
'.',
'_'],
[0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0],
[1, 4, 4, 4, 1, 1])
"""
  1. Use Pretrained Malaysian BERT, melo/text/malay_bert.py.
  2. Extend symbols, melo/text/symbols.py.
  3. Hardcode the size of vocab and tone based on pretrained but use the new size during inference, melo/models.py,
self.enc_p = TextEncoder(
    n_vocab if is_eval else 219,
    inter_channels,
    hidden_channels,
    filter_channels,
    n_heads,
    n_layers,
    kernel_size,
    p_dropout,
    gin_channels=self.enc_gin_channels,
    num_languages=num_languages,
    num_tones=num_tones if is_eval else 16,
)
  1. Use the official pretrained models after that extend the embedding size, melo/train.py,
utils.load_checkpoint(
  hps.pretrain_G,
  net_g,
  None,
  skip_optimizer=True
)

old_embeddings = net_g.module.enc_p.emb
net_g.module.enc_p.emb = net_g.module.get_resized_embeddings(old_embeddings, len(symbols))

old_embeddings = net_g.module.enc_p.tone_emb
net_g.module.enc_p.tone_emb = net_g.module.get_resized_embeddings(old_embeddings, 18)

print(net_g.module.enc_p.emb.weight.shape, net_g.module.enc_p.tone_emb.weight.shape)

Usage

The Python API and model cards can be found in this repo or on HuggingFace.

Contributing

If you find this work useful, please consider contributing to this repo.

  • Many thanks to @fakerybakery for adding the Web UI and CLI part.

Authors

Citation

@software{zhao2024melo,
  author={Zhao, Wenliang and Yu, Xumin and Qin, Zengyi},
  title = {MeloTTS: High-quality Multi-lingual Multi-accent Text-to-Speech},
  url = {https://github.com/myshell-ai/MeloTTS},
  year = {2023}
}

License

This library is under MIT License, which means it is free for both commercial and non-commercial use.

Acknowledgements

This implementation is based on TTS, VITS, VITS2 and Bert-VITS2. We appreciate their awesome work.

About

High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese, Korean and Malay.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 77.7%
  • Jupyter Notebook 22.2%
  • Other 0.1%