LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
-
Updated
Sep 24, 2024 - Python
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
A simple, high-quality voice conversion tool focused on ease of use and performance.
A desktop application that uses AI to translate voice between languages in real time, while preserving the speaker's tone and emotion.
Code for NeurIPS 2023 paper "DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech Translation".
If you've ever had the wish to talk to your AI Waifu using quality characters and voices for character voicing, then I suggest Soul of Waifu. Don't miss the opportunity to touch your dream!
💬 "Realtime" voice transcription and cloning using ElevenLabs's API.
svelte component for using the openai realtime api
Speech to text to speech using Elevenlabs
Code for the INTERSPEECH 2023 paper "Learning When to Speak: Latency and Quality Trade-offs for Simultaneous Speech-to-Speech Translation with Offline Models"
High-quality and streaming Speech-to-Speech interactive agent in a single file. 只用一个文件实现的流式全双工语音交互原型智能体!
Chatter Box is an android app that is capable of Voice, Text, Image Text Translation, and end-to-end chat translation.
A user-friendly interface for ElevenLabs' API with added audio transcription capability.
Systems submitted to IWSLT 2022 by the MT-UPC group.
End-to-End AI Voice Assistant pipeline with Whisper for Speech-to-Text, Hugging Face LLM for response generation, and Edge-TTS for Text-to-Speech. Features include Voice Activity Detection (VAD), tunable parameters for pitch, gender, and speed, and real-time response with latency optimization.
Speech-to-Speech translation dataset for German and English (text and speech quadruplets).
CtrlSpeak is a voice assistant activated with [Control]+Q, listening and responding only when you want.
This repository contains the code for a speech to speech translation system created from scratch for digits translation from English to Tamil
3-month project on artificial intelligence in teams of 3 with Manon Duboscq and Léa Mariot
A lite tool to quickly customize LLM chatbot workflow pipelines, like Text-to-Text, Text-to-Speech or Speech-to-Speech
Add a description, image, and links to the speech-to-speech topic page so that developers can more easily learn about it.
To associate your repository with the speech-to-speech topic, visit your repo's landing page and select "manage topics."