This repository introduces a Persian Text-to-Speech (TTS) model trained on the ManaTTS dataset, the largest publicly accessible single-speaker Persian corpus. The dataset comprises over 100 hours of high-quality audio (44.1 kHz) sourced from the Nasl-e-Mana magazine. The model is based on the Tacotron2 architecture and is designed to generate natural and high-quality Persian speech.
Model Weights: The trained model weights are hosted on Hugging Face. You can access them here: Persian-Tacotron2-on-ManaTTS.
You can use the provided inference notebook to generate speech from text.
- GitHub Notebook: inference.ipynb
- Google Colab: Open in Colab
You can find output samples synthesized by the trained model in this directory along with the same utterances generated by two baseline models, the natural utterances, and utterances with gold spectrograms where the waveform is generated by the vocoder used in the study.
The ManaTTS dataset and model are provided exclusively for research and development purposes. We emphasize the critical importance of ethical conduct in utilizing this dataset. Please refrain from any misuse, including but not limited to voice impersonation, identity theft, or fraudulent activities.
By accessing and using the ManaTTS dataset and model, you are obligated to uphold the highest standards of integrity and respect for user privacy. Any violation of these principles may have severe legal and ethical consequences.
We would like to express our sincere gratitude to Nasl-e-Mana, the monthly magazine of the blind community of Iran, for their generosity. Their commitment to openness and collaboration has been instrumental in advancing research and development in speech synthesis. We are especially thankful for their choice to release the data under the Creative Commons CC-0 license, allowing for unrestricted use and distribution.
We encourage researchers, developers, and the broader community to utilize the resources provided in this project, particularly in the development of high-quality screen readers and other assistive technologies to support the Iranian blind community. By fostering open-source collaboration, we aim to drive innovation and improve accessibility for all.
- ManaTTS Dataset: Hugging Face Dataset | GitHub Repository
- Tacotron2 Implementation: GitHub Repository
- Model Weights: Hugging Face Model Repository
The model weights are licensed under CC0-1.0, the same license as the ManaTTS dataset.
The model implementation is based on Real-Time-Voice-Cloning, which is licensed under the MIT License. Below is the copyright statement for the original and modified works:
Modified & original work Copyright (c) 2019 Corentin Jemine (https://github.com/CorentinJ)
Original work Copyright (c) 2018 Rayhane Mama (https://github.com/Rayhane-mamah)
Original work Copyright (c) 2019 fatchord (https://github.com/fatchord)
Original work Copyright (c) 2015 braindead (https://github.com/braindead)
Modified work Copyright (c) 2025 Majid Adibian (https://github.com/Adibian)
Modified work Copyright (c) 2025 Mahta Fetrat (https://github.com/MahtaFetrat)
If you use the ManaTTS dataset or this model in your research, please cite the following paper:
@article{fetrat2024manatts,
title={ManaTTS Persian: A Recipe for Creating TTS Datasets for Lower-Resource Languages},
author={Mahta Fetrat Qharabagh and Zahra Dehghanian and Hamid R. Rabiee},
journal={arXiv preprint arXiv:2409.07259},
year={2024},
}