Merge pull request #366 from travisvn/patch-6

Update openai-edge-tts-integration.md
open-webui · Jan 29, 2025 · 5a05b5d · 5a05b5d
2 parents f7e7f11 + ff0ef50
commit 5a05b5d
Showing 1 changed file with 27 additions and 18 deletions.
diff --git a/docs/tutorials/text-to-speech/openai-edge-tts-integration.md b/docs/tutorials/text-to-speech/openai-edge-tts-integration.md
@@ -9,21 +9,29 @@ This tutorial is a community contribution and is not supported by the OpenWebUI
 
 # Integrating `openai-edge-tts` 🗣️ with Open WebUI
 
-## What is `openai-edge-tts`, and how is it different from `openedai-speech`?
+## What is `openai-edge-tts`? 
+
+[OpenAI Edge TTS](https://github.com/travisvn/openai-edge-tts) is a text-to-speech API that mimics the OpenAI API endpoint, allowing for a direct substitute in scenarios where you can define the endpoint URL, like with Open WebUI.
+
+It uses the [edge-tts](https://github.com/rany2/edge-tts) package, which leverages the Edge browser's free "Read Aloud" feature to emulate a request to Microsoft / Azure in order to receive very high quality text-to-speech for free.
+
+[Sample the voices here](https://tts.travisvn.com)
+
+<details>
+  <summary>How is it different from 'openedai-speech'?</summary>
 
 Similar to [openedai-speech](https://github.com/matatonic/openedai-speech), [openai-edge-tts](https://github.com/travisvn/openai-edge-tts) is a text-to-speech API endpoint that mimics the OpenAI API endpoint, allowing for a direct substitute in scenarios where the OpenAI Speech endpoint is callable and the server endpoint URL can be configured.
 
 `openedai-speech` is a more comprehensive option that allows for entirely offline generation of speech with many modalities to choose from.
 
 `openai-edge-tts` is a simpler option that uses a Python package called `edge-tts` to generate the audio.
 
-`edge-tts` ([repo](https://github.com/rany2/edge-tts)) leverages the Edge browser's free "Read Aloud" feature to emulate a request to Microsoft / Azure in order to receive very high quality text-to-speech for free.
+</details>
 
 ## Requirements
 
 - Docker installed on your system
 - Open WebUI running
-- ffmpeg (Optional - Only required if opting to not use `mp3` format)
 
 ## ⚡️ Quick start
 
@@ -37,7 +45,7 @@ This will run the service at port 5050 with all the default configs
 
 ## Setting up Open WebUI to use `openai-edge-tts`
 
-- Open the Admin Panel and go to Settings -> Audio
+- Open the Admin Panel and go to `Settings` -> `Audio`
 - Set your TTS Settings to match the screenshot below
 - _Note: you can specify the TTS Voice here_
 
@@ -49,16 +57,12 @@ The default API key is the string `your_api_key_here`. You do not have to change
 
 **And that's it! You can end here**
 
-See the [Usage](#usage) section for request examples.
-
 # Please ⭐️ star the repo on GitHub if you find [OpenAI Edge TTS](https://github.com/travisvn/openai-edge-tts) useful
 
-:::tip
-You can define the environment variables directly in the `docker run` command. See [Quick Config for Docker](#-quick-config-for-docker) below.
-:::
-
-## Alternative Options
 
+<details>
+  <summary>Running with Python</summary>
+
 ### 🐍 Running with Python
 
 If you prefer to run this project directly with Python, follow these steps to set up a virtual environment, install dependencies, and start the server.
@@ -100,9 +104,9 @@ Create a `.env` file in the root directory and set the following variables:
 API_KEY=your_api_key_here
 PORT=5050
 
-DEFAULT_VOICE=en-US-AndrewNeural
+DEFAULT_VOICE=en-US-AvaNeural
 DEFAULT_RESPONSE_FORMAT=mp3
-DEFAULT_SPEED=1.2
+DEFAULT_SPEED=1.0
 
 DEFAULT_LANGUAGE=en-US
 
@@ -125,8 +129,11 @@ The server will start running at `http://localhost:5050`.
 
 You can now interact with the API at `http://localhost:5050/v1/audio/speech` and other available endpoints. See the [Usage](#usage) section for request examples.
 
-#### Usage
+</details>
 
+<details>
+  <summary>Usage details</summary>
+
 ##### Endpoint: `/v1/audio/speech` (aliased with `/audio/speech`)
 
 Generates audio from the input text. Available parameters:
@@ -138,9 +145,9 @@ Generates audio from the input text. Available parameters:
 **Optional Parameters:**
 
 - **model** (string): Set to "tts-1" or "tts-1-hd" (default: `"tts-1"`).
-- **voice** (string): One of the OpenAI-compatible voices (alloy, echo, fable, onyx, nova, shimmer) or any valid `edge-tts` voice (default: `"en-US-AndrewNeural"`).
+- **voice** (string): One of the OpenAI-compatible voices (alloy, echo, fable, onyx, nova, shimmer) or any valid `edge-tts` voice (default: `"en-US-AvaNeural"`).
 - **response_format** (string): Audio format. Options: `mp3`, `opus`, `aac`, `flac`, `wav`, `pcm` (default: `mp3`).
-- **speed** (number): Playback speed (0.25 to 4.0). Default is `1.2`.
+- **speed** (number): Playback speed (0.25 to 4.0). Default is `1.0`.
 
 :::tip
 You can browse available voices and listen to sample previews at [tts.travisvn.com](https://tts.travisvn.com)
@@ -203,6 +210,8 @@ Additionally, there are endpoints for **Azure AI Speech** and **ElevenLabs** for
 These can be disabled by setting the environment variable `EXPAND_API=False`.
 :::
 
+</details>
+
 ## 🐳 Quick Config for Docker
 
 You can configure the environment variables in the command used to run the project
@@ -211,9 +220,9 @@ You can configure the environment variables in the command used to run the proje
 docker run -d -p 5050:5050 \
   -e API_KEY=your_api_key_here \
   -e PORT=5050 \
-  -e DEFAULT_VOICE=en-US-AndrewNeural \
+  -e DEFAULT_VOICE=en-US-AvaNeural \
   -e DEFAULT_RESPONSE_FORMAT=mp3 \
-  -e DEFAULT_SPEED=1.2 \
+  -e DEFAULT_SPEED=1.0 \
   -e DEFAULT_LANGUAGE=en-US \
   -e REQUIRE_API_KEY=True \
   -e REMOVE_FILTER=False \