⚠️ WARNING: UNSTABLE API - INITIAL RELEASE⚠️ This API is currently in its initial release phase (v1.0.0) and is considered unstable. Breaking changes may occur without notice. Use in production at your own risk. For development and testing purposes only.
A production-grade FastAPI implementation of the Zonos Text-to-Speech model.
This API is built on top of the Zonos-v0.1-hybrid and Zonos-v0.1-transformer models created by Zyphra. The models feature:
- Zero-shot TTS with voice cloning capabilities
- Support for multiple languages (100+ languages via eSpeak-ng)
- High-quality 44kHz audio output
- Fine-grained control over speaking rate, pitch, audio quality, and emotions
- Real-time performance (~2x real-time on RTX 4090)
For more information, visit the model cards on Hugging Face: Hybrid | Transformer.
- FastAPI-based REST API for Zonos Text-to-Speech model
- Support for both Transformer and Hybrid model variants
- Docker and docker-compose support with NVIDIA GPU acceleration
- Production-ready with Gunicorn workers and optimizations
- Prometheus and Grafana monitoring integration
- Health checks and comprehensive logging
- CORS support and Swagger documentation
- Voice cloning and audio continuation support
- Fine-grained emotion and audio quality control
The fastest way to get started is using our pre-built Docker image:
docker pull ghcr.io/manascb1344/zonos-api-gpu:v1.0.0
docker run -d \
--name zonos-api-gpu \
--gpus all \
-p 8000:8000 \
-e CUDA_VISIBLE_DEVICES=0 \
zonos-api-gpu
- Clone the repository with submodules:
git clone --recursive https://github.com/manascb1344/zonos-api
cd zonos-api
The API will be available at http://localhost:8000
- Build the container:
docker build -t zonos-api .
- Run the container:
docker run -d \
--name zonos-api \
--gpus all \
-p 8000:8000 \
-e CUDA_VISIBLE_DEVICES=0 \
zonos-api
CUDA_VISIBLE_DEVICES
: Specify which GPU(s) to use (default: 0)USE_GPU
: Enable/disable GPU usage (default: true)
- Docker with NVIDIA Container Toolkit installed
- NVIDIA GPU with CUDA support
- At least 8GB of GPU memory recommended
Check if the API is running:
curl http://localhost:8000/health
Root endpoint that returns basic API information
Returns a list of available TTS models
Returns a list of supported languages
Returns available conditioners for a specific model
Generate speech from text. Example request:
{
"model_choice": "Zyphra/Zonos-v0.1-transformer",
"text": "Hello, this is a test.",
"language": "en-us",
"emotion_values": [1.0, 0.05, 0.05, 0.05, 0.05, 0.05, 0.1, 0.2],
"vq_score": 0.78,
"cfg_scale": 2.0,
"min_p": 0.15
}
USE_GPU
: Set to "true" to enable GPU acceleration (default: true)PYTHONPATH
: Set to the application root directory
The API uses NVIDIA GPU acceleration by default. Make sure you have:
- NVIDIA GPU with CUDA support
- NVIDIA drivers installed
- NVIDIA Container Toolkit installed and configured
- Python 3.10+
- NVIDIA GPU with CUDA support (recommended)
- Docker and docker-compose (for containerized deployment)
# Start in development mode
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
# Or with docker-compose
docker-compose up --build
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.