Original README at https://github.com/myshell-ai/MeloTTS
MeloTTS MS is a forked of https://github.com/myshell-ai/MeloTTS to support Malay language, models and checkpoints with optimizer states released at https://huggingface.co/malaysia-ai/MeloTTS-MS
- Use
ms
phonemizer and Malaya Speech normalizer, melo/text/malay.py,
text = 'hello nama saya.'
text = text_normalize(text)
phones, tones, word2ph = g2p(text)
"""
(['_',
'h',
'ˈɛ',
'l',
'o',
'n',
'ˈa',
'm',
'ə',
's',
'ˈa',
'j',
'ə',
'.',
'_'],
[0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0],
[1, 4, 4, 4, 1, 1])
"""
- Use Pretrained Malaysian BERT, melo/text/malay_bert.py.
- Extend symbols, melo/text/symbols.py.
- Hardcode the size of vocab and tone based on pretrained but use the new size during inference, melo/models.py,
self.enc_p = TextEncoder(
n_vocab if is_eval else 219,
inter_channels,
hidden_channels,
filter_channels,
n_heads,
n_layers,
kernel_size,
p_dropout,
gin_channels=self.enc_gin_channels,
num_languages=num_languages,
num_tones=num_tones if is_eval else 16,
)
- Use the official pretrained models after that extend the embedding size, melo/train.py,
utils.load_checkpoint(
hps.pretrain_G,
net_g,
None,
skip_optimizer=True
)
old_embeddings = net_g.module.enc_p.emb
net_g.module.enc_p.emb = net_g.module.get_resized_embeddings(old_embeddings, len(symbols))
old_embeddings = net_g.module.enc_p.tone_emb
net_g.module.enc_p.tone_emb = net_g.module.get_resized_embeddings(old_embeddings, 18)
print(net_g.module.enc_p.emb.weight.shape, net_g.module.enc_p.tone_emb.weight.shape)
The Python API and model cards can be found in this repo or on HuggingFace.
Contributing
If you find this work useful, please consider contributing to this repo.
- Many thanks to @fakerybakery for adding the Web UI and CLI part.
- Wenliang Zhao at Tsinghua University
- Xumin Yu at Tsinghua University
- Zengyi Qin (project lead) at MIT and MyShell
Citation
@software{zhao2024melo,
author={Zhao, Wenliang and Yu, Xumin and Qin, Zengyi},
title = {MeloTTS: High-quality Multi-lingual Multi-accent Text-to-Speech},
url = {https://github.com/myshell-ai/MeloTTS},
year = {2023}
}
This library is under MIT License, which means it is free for both commercial and non-commercial use.
This implementation is based on TTS, VITS, VITS2 and Bert-VITS2. We appreciate their awesome work.