From 8d78d7b3f53fe2e66ad5c87782d77f8b9002a96e Mon Sep 17 00:00:00 2001 From: Travis Van Nimwegen Date: Wed, 22 Jan 2025 20:56:30 -0500 Subject: [PATCH 1/2] Update openai-edge-tts-integration.md --- .../openai-edge-tts-integration.md | 45 +++++++++++-------- 1 file changed, 27 insertions(+), 18 deletions(-) diff --git a/docs/tutorials/text-to-speech/openai-edge-tts-integration.md b/docs/tutorials/text-to-speech/openai-edge-tts-integration.md index 83b8aee8..ecb32891 100644 --- a/docs/tutorials/text-to-speech/openai-edge-tts-integration.md +++ b/docs/tutorials/text-to-speech/openai-edge-tts-integration.md @@ -9,7 +9,16 @@ This tutorial is a community contribution and is not supported by the OpenWebUI # Integrating `openai-edge-tts` 🗣️ with Open WebUI -## What is `openai-edge-tts`, and how is it different from `openedai-speech`? +## What is `openai-edge-tts`? + +[OpenAI Edge TTS](https://github.com/travisvn/openai-edge-tts) is a text-to-speech API that mimics the OpenAI API endpoint, allowing for a direct substitute in scenarios where the OpenAI Speech is used. + +It uses the [edge-tts](https://github.com/rany2/edge-tts) package, which leverages the Edge browser's free "Read Aloud" feature to emulate a request to Microsoft / Azure in order to receive very high quality text-to-speech for free. + +[Sample the voices here](https://tts.travisvn.com) + +
+ How is it different from 'openedai-speech'? Similar to [openedai-speech](https://github.com/matatonic/openedai-speech), [openai-edge-tts](https://github.com/travisvn/openai-edge-tts) is a text-to-speech API endpoint that mimics the OpenAI API endpoint, allowing for a direct substitute in scenarios where the OpenAI Speech endpoint is callable and the server endpoint URL can be configured. @@ -17,13 +26,12 @@ Similar to [openedai-speech](https://github.com/matatonic/openedai-speech), [ope `openai-edge-tts` is a simpler option that uses a Python package called `edge-tts` to generate the audio. -`edge-tts` ([repo](https://github.com/rany2/edge-tts)) leverages the Edge browser's free "Read Aloud" feature to emulate a request to Microsoft / Azure in order to receive very high quality text-to-speech for free. +
## Requirements - Docker installed on your system - Open WebUI running -- ffmpeg (Optional - Only required if opting to not use `mp3` format) ## ⚡️ Quick start @@ -37,7 +45,7 @@ This will run the service at port 5050 with all the default configs ## Setting up Open WebUI to use `openai-edge-tts` -- Open the Admin Panel and go to Settings -> Audio +- Open the Admin Panel and go to `Settings` -> `Audio` - Set your TTS Settings to match the screenshot below - _Note: you can specify the TTS Voice here_ @@ -49,16 +57,12 @@ The default API key is the string `your_api_key_here`. You do not have to change **And that's it! You can end here** -See the [Usage](#usage) section for request examples. - # Please ⭐️ star the repo on GitHub if you find [OpenAI Edge TTS](https://github.com/travisvn/openai-edge-tts) useful -:::tip -You can define the environment variables directly in the `docker run` command. See [Quick Config for Docker](#-quick-config-for-docker) below. -::: - -## Alternative Options +
+ Running with Python + ### 🐍 Running with Python If you prefer to run this project directly with Python, follow these steps to set up a virtual environment, install dependencies, and start the server. @@ -100,9 +104,9 @@ Create a `.env` file in the root directory and set the following variables: API_KEY=your_api_key_here PORT=5050 -DEFAULT_VOICE=en-US-AndrewNeural +DEFAULT_VOICE=en-US-AvaNeural DEFAULT_RESPONSE_FORMAT=mp3 -DEFAULT_SPEED=1.2 +DEFAULT_SPEED=1.0 DEFAULT_LANGUAGE=en-US @@ -125,8 +129,11 @@ The server will start running at `http://localhost:5050`. You can now interact with the API at `http://localhost:5050/v1/audio/speech` and other available endpoints. See the [Usage](#usage) section for request examples. -#### Usage +
+
+ Usage details + ##### Endpoint: `/v1/audio/speech` (aliased with `/audio/speech`) Generates audio from the input text. Available parameters: @@ -138,9 +145,9 @@ Generates audio from the input text. Available parameters: **Optional Parameters:** - **model** (string): Set to "tts-1" or "tts-1-hd" (default: `"tts-1"`). -- **voice** (string): One of the OpenAI-compatible voices (alloy, echo, fable, onyx, nova, shimmer) or any valid `edge-tts` voice (default: `"en-US-AndrewNeural"`). +- **voice** (string): One of the OpenAI-compatible voices (alloy, echo, fable, onyx, nova, shimmer) or any valid `edge-tts` voice (default: `"en-US-AvaNeural"`). - **response_format** (string): Audio format. Options: `mp3`, `opus`, `aac`, `flac`, `wav`, `pcm` (default: `mp3`). -- **speed** (number): Playback speed (0.25 to 4.0). Default is `1.2`. +- **speed** (number): Playback speed (0.25 to 4.0). Default is `1.0`. :::tip You can browse available voices and listen to sample previews at [tts.travisvn.com](https://tts.travisvn.com) @@ -203,6 +210,8 @@ Additionally, there are endpoints for **Azure AI Speech** and **ElevenLabs** for These can be disabled by setting the environment variable `EXPAND_API=False`. ::: +
+ ## 🐳 Quick Config for Docker You can configure the environment variables in the command used to run the project @@ -211,9 +220,9 @@ You can configure the environment variables in the command used to run the proje docker run -d -p 5050:5050 \ -e API_KEY=your_api_key_here \ -e PORT=5050 \ - -e DEFAULT_VOICE=en-US-AndrewNeural \ + -e DEFAULT_VOICE=en-US-AvaNeural \ -e DEFAULT_RESPONSE_FORMAT=mp3 \ - -e DEFAULT_SPEED=1.2 \ + -e DEFAULT_SPEED=1.0 \ -e DEFAULT_LANGUAGE=en-US \ -e REQUIRE_API_KEY=True \ -e REMOVE_FILTER=False \ From ff0ef500b61e4e4dbb998714568e025414207b77 Mon Sep 17 00:00:00 2001 From: Travis Van Nimwegen Date: Wed, 22 Jan 2025 21:07:37 -0500 Subject: [PATCH 2/2] Update openai-edge-tts-integration.md --- docs/tutorials/text-to-speech/openai-edge-tts-integration.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/tutorials/text-to-speech/openai-edge-tts-integration.md b/docs/tutorials/text-to-speech/openai-edge-tts-integration.md index ecb32891..44c15f70 100644 --- a/docs/tutorials/text-to-speech/openai-edge-tts-integration.md +++ b/docs/tutorials/text-to-speech/openai-edge-tts-integration.md @@ -11,7 +11,7 @@ This tutorial is a community contribution and is not supported by the OpenWebUI ## What is `openai-edge-tts`? -[OpenAI Edge TTS](https://github.com/travisvn/openai-edge-tts) is a text-to-speech API that mimics the OpenAI API endpoint, allowing for a direct substitute in scenarios where the OpenAI Speech is used. +[OpenAI Edge TTS](https://github.com/travisvn/openai-edge-tts) is a text-to-speech API that mimics the OpenAI API endpoint, allowing for a direct substitute in scenarios where you can define the endpoint URL, like with Open WebUI. It uses the [edge-tts](https://github.com/rany2/edge-tts) package, which leverages the Edge browser's free "Read Aloud" feature to emulate a request to Microsoft / Azure in order to receive very high quality text-to-speech for free.