Skip to content

Tacotron2 Persian Text-to-Speech Model trained on ManaTTS, the largest open single-speaker Persian speech dataset with over 100 hours of high-quality audio.

License

Notifications You must be signed in to change notification settings

MahtaFetrat/ManaTTS-Persian-Tacotron2-Model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Repository files navigation

ManaTTS-Persian-Tacotron2-Model

This repository introduces a Persian Text-to-Speech (TTS) model trained on the ManaTTS dataset, the largest publicly accessible single-speaker Persian corpus. The dataset comprises over 100 hours of high-quality audio (44.1 kHz) sourced from the Nasl-e-Mana magazine. The model is based on the Tacotron2 architecture and is designed to generate natural and high-quality Persian speech.

Model Weights: The trained model weights are hosted on Hugging Face. You can access them here: Persian-Tacotron2-on-ManaTTS.


Inference

You can use the provided inference notebook to generate speech from text.

Inference Notebook:


Output Samples

You can find output samples synthesized by the trained model in this directory along with the same utterances generated by two baseline models, the natural utterances, and utterances with gold spectrograms where the waveform is generated by the vocoder used in the study.


Ethical Use

The ManaTTS dataset and model are provided exclusively for research and development purposes. We emphasize the critical importance of ethical conduct in utilizing this dataset. Please refrain from any misuse, including but not limited to voice impersonation, identity theft, or fraudulent activities.

By accessing and using the ManaTTS dataset and model, you are obligated to uphold the highest standards of integrity and respect for user privacy. Any violation of these principles may have severe legal and ethical consequences.


Acknowledgments

We would like to express our sincere gratitude to Nasl-e-Mana, the monthly magazine of the blind community of Iran, for their generosity. Their commitment to openness and collaboration has been instrumental in advancing research and development in speech synthesis. We are especially thankful for their choice to release the data under the Creative Commons CC-0 license, allowing for unrestricted use and distribution.


Collaboration and Community Impact

We encourage researchers, developers, and the broader community to utilize the resources provided in this project, particularly in the development of high-quality screen readers and other assistive technologies to support the Iranian blind community. By fostering open-source collaboration, we aim to drive innovation and improve accessibility for all.


References


License

The model weights are licensed under CC0-1.0, the same license as the ManaTTS dataset.

The model implementation is based on Real-Time-Voice-Cloning, which is licensed under the MIT License. Below is the copyright statement for the original and modified works:

Modified & original work Copyright (c) 2019 Corentin Jemine (https://github.com/CorentinJ)  
Original work Copyright (c) 2018 Rayhane Mama (https://github.com/Rayhane-mamah)  
Original work Copyright (c) 2019 fatchord (https://github.com/fatchord)  
Original work Copyright (c) 2015 braindead (https://github.com/braindead)  
Modified work Copyright (c) 2025 Majid Adibian (https://github.com/Adibian)  
Modified work Copyright (c) 2025 Mahta Fetrat (https://github.com/MahtaFetrat)

Citation

If you use the ManaTTS dataset or this model in your research, please cite the following paper:

@article{fetrat2024manatts,
      title={ManaTTS Persian: A Recipe for Creating TTS Datasets for Lower-Resource Languages}, 
      author={Mahta Fetrat Qharabagh and Zahra Dehghanian and Hamid R. Rabiee},
      journal={arXiv preprint arXiv:2409.07259},
      year={2024},
}

About

Tacotron2 Persian Text-to-Speech Model trained on ManaTTS, the largest open single-speaker Persian speech dataset with over 100 hours of high-quality audio.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published