-
-
Notifications
You must be signed in to change notification settings - Fork 17
Advanced Info
Synthalingua is a powerful, open-source, real-time audio translation tool driven by artificial intelligence. It leverages the power of OpenAI's Whisper model to provide accurate transcriptions and translations across a multitude of languages. Whether you're a language enthusiast, a gamer connecting with international communities, or simply want to enjoy content from around the world, Synthalingua has something to offer.
- Real-time Transcription and Translation: Experience near-instantaneous conversion of spoken language into text and translations.
- Multilingual Prowess: Synthalingua supports a vast library of languages, breaking down communication barriers and opening up a world of content.
- Hardware Accelerated: Harness the power of your GPU for lightning-fast processing or utilize your CPU's multi-core capabilities for efficient performance.
- Highly Customizable: Fine-tune every aspect of your experience with granular control over RAM usage, microphone sensitivity, output formats, and much more.
- Seamless Discord Integration: Share real-time transcriptions and translations directly to your Discord server, enhancing communication and accessibility.
- Live Stream Transcription: Follow along with your favorite Twitch or YouTube streams, even if they're in another language, with live transcription and translation.
- Effortless Caption Generation: Create accurate subtitles for your video and audio files, making them accessible to a wider audience.
- Intuitive Web Server Interface: View and interact with transcriptions and translations conveniently through your web browser.
Synthalingua's command-line interface is your gateway to a world of customizable audio processing. Here's a breakdown of the arguments, grouped by functionality:
Resource Management:
-
--ram
: Tailor the model size to your hardware's capabilities. Options: "1GB", "2GB", "4GB", "6GB", "12GB". -
--ramforce
: Override safety checks and force a specific RAM allocation (use with caution!). -
--device
: Choose between CPU (cpu
) and GPU (cuda
) processing. -
--cuda_device
: Specify the GPU ID to use if you have multiple GPUs.
Audio Input and Control:
-
--energy_threshold
: Fine-tune microphone sensitivity (higher values = less sensitive). -
--mic_calibration_time
: Set microphone calibration time in seconds (0 to skip). -
--record_timeout
: Control real-time recording duration in seconds (how much audio is processed at once). -
--phrase_timeout
: Set the silence duration (in seconds) that defines a new phrase. -
--list_microphones
: List available microphones and their IDs. -
--set_microphone
: Set the default microphone by name or ID. -
--microphone_enabled
: Enable or disable microphone input (true
orfalse
). -
--stream
: Provide the URL of an HLS stream for live transcription. -
--file_input
: Specify the path to an audio or video file.
Language Selection and Translation:
-
--language
: Set the source language of the audio. -
--target_language
: Set the desired output language. -
--translate
: Enable translation to English. -
--transcribe
: Enable transcription to the target language. -
--auto_model_swap
: Automatically switch language models based on detected speech. -
--auto_language_lock
: Lock onto a detected language after a few successful recognitions.
Output Customization and Integration:
-
--no_log
: Show only the latest translation/transcription. -
--discord_webhook
: Send output to a Discord webhook. -
--save_transcript
: Save the transcript to a file. -
--save_folder
: Specify the output folder for transcripts. -
--makecaptions
: Enable caption generation mode (requires additional flags). -
--portnumber
: Start a web server on the specified port.
Stream-Specific Options:
-
--stream_original_text
: Display the original detected text from the stream. -
--stream_chunks
: Set the number of audio chunks used for stream processing (higher = more accurate, but potentially more latency). -
--stream_language
: Specify the stream's language. -
--stream_target_language
: Set the target language for stream translation. -
--stream_translate
: Enable stream translation. -
--stream_transcribe
: Enable stream transcription. -
--cookies
: Specify a cookie file (Netscape format) for accessing restricted streams. -
--remote_hls_password_id
&--remote_hls_password
: Provide credentials for password-protected HLS streams.
Caption Generation:
-
--file_output
: Specify the output folder for captions. -
--file_output_name
: Set the filename for the caption file (without extension).
Advanced Features:
-
--use_finetune
: Use a fine-tuned model for potentially higher accuracy. -
--updatebranch
: Choose the GitHub branch to check for updates. -
--keep_temp
: Keep temporary audio files. -
--retry
: Retry transcription/translation if it fails. -
--about
: Display information about Synthalingua. -
--ignorelist
: Specify a file with words/phrases to ignore. -
--condition_on_previous_text
: Condition the model on previous text for better coherence.
Here are a few examples to get you started:
1. Real-time Japanese to English translation with microphone input:
python transcribe_audio.py --ram 6gb --translate --language ja --energy_threshold 300
2. Generate English captions for a French movie:
python transcribe_audio.py --ram 12gb --makecaptions --file_input "movie.mkv" --file_output "subtitles" --file_output_name "eng_subs" --language fr
3. Transcribe a live Korean Twitch stream in real-time, displaying the original text and English translation in your browser:
python transcribe_audio.py --ram 12gb --stream_translate --stream_language ko --stream "https://www.twitch.tv/yourstream" --stream_original_text --portnumber 4000
For solutions to common issues, refer to the "Troubleshooting" section in the main README file. If you need further assistance or want to contribute to the project, visit the Synthalingua GitHub repository.