#

asr

Here are 1,243 public repositories matching this topic...

m-bain / whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

speech speech-recognition speech-to-text whisper asr

Updated Apr 12, 2025
Python

NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

machine-translation tts speech-synthesis neural-networks deeplearning speaker-recognition asr multimodal speech-translation large-language-models speaker-diariazation generative-ai

Updated Apr 16, 2025
Python

PaddlePaddle / PaddleSpeech

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

Updated Apr 16, 2025
Python

speechbrain / speechbrain

A PyTorch-based Speech Toolkit

Updated Apr 16, 2025
Python

alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node

Updated Mar 5, 2025
Jupyter Notebook

wukong-robot

wzpan / wukong-robot

🤖 wukong-robot 是一个简单、灵活、优雅的中文语音对话机器人/智能音箱项目，支持ChatGPT多轮对话能力，还可能是首个支持脑机交互的开源智能音箱项目。

alexa ai amazon-echo muse tts openai google-home unit bci speaker homeassistant snowboy asr anyq raspeberry-pi gpt3 chatgpt

Updated Oct 25, 2024
Python

k2-fsa / sherpa-onnx

Speech-to-text, text-to-speech, speaker diarization, speech enhancement, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, support 11 programming languages

Updated Apr 16, 2025
C++

TEN-framework / TEN-Agent

TEN Agent is a conversational voice AI agent powered by TEN, integrating Deepseek, Gemini, OpenAI, RTC, and hardware like ESP32. It enables realtime AI capabilities like seeing, hearing, and speaking, and is fully compatible with platforms like Dify and Coze.

Updated Apr 16, 2025
Python

FunAudioLLM / SenseVoice

Multilingual Voice Understanding Model

multilingual python ai pytorch speech-recognition speech-to-text asr cross-lingual speech-emotion-recognition audio-event-classification aigc llm gpt-4o

Updated Mar 23, 2025
Python

snakers4 / silero-models

Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple

Updated Oct 18, 2023
Jupyter Notebook

xiangyuecn / Recorder

html5 js 录音 mp3 wav ogg webm amr g711a g711u 格式，支持pc和Android、iOS部分浏览器、Hybrid App（提供Android iOS App源码）、微信，提供ASR语音识别转文字 H5版语音通话聊天示例 DTMF编码解码

Updated Mar 31, 2025
JavaScript

NexaAI / nexa-sdk

Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR), and text-to-speech (TTS) capabilities.

audio sdk transformers tts language-model whisper asr vlm sdk-python edge-computing on-device-ml on-device-ai llm stable-diffusion

Updated Mar 6, 2025
Python

wenet-e2e / wenet

Production First and Production Ready End-to-End Speech Recognition Toolkit

pytorch transformer speech-recognition automatic-speech-recognition production-ready whisper asr conformer e2e-models

Updated Mar 29, 2025
Python

MahmoudAshraf97 / whisper-diarization

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper

speech speech-recognition speech-to-text whisper asr speaker-diarization

Updated Mar 12, 2025
Jupyter Notebook

jdepoix / youtube-transcript-api

This is a python API which allows you to get the transcript/subtitles for a given YouTube video. It also works for automatically generated subtitles and it does not require an API key nor a headless browser, like other selenium based solutions do!

python cli youtube youtube-video youtube-api captions subtitles transcript subtitle transcripts asr youtube-subtitles youtube-transcripts youtube-captions youtube-transcript translating-transcripts youtube-asr

Updated Mar 25, 2025
Python

PeterH0323 / Streamer-Sales

Streamer-Sales 销冠 —— 卖货主播 LLM 大模型🛒🎁，一个能够根据给定的商品特点从激发用户购买意愿角度出发进行商品解说的卖货主播大模型。🚀⭐内含详细的数据生成流程❗ 📦另外还集成了 LMDeploy 加速推理🚀、RAG检索增强生成 📚、TTS文字转语音🔊、数字人生成 🦸、 Agent 使用网络查询实时信息🌐、ASR 语音转文字🎙️、Vue 生态搭建前端🍍、FastAPI 搭建后端🗝️、Docker-compose 打包部署🐋

chat chatbot text-generation tts gpt chat-application asr rag digital-human llm chatgpt internlm-chat-7b internlm2 meta-human

Updated Mar 8, 2025
Python

tensorflow / lingvo

Lingvo

nlp research translation tensorflow machine-translation speech distributed tts speech-synthesis mnist speech-recognition lm seq2seq speech-to-text gpu-computing language-model asr

Updated Mar 14, 2025
Python

ahmetoner / whisper-asr-webservice

OpenAI Whisper ASR Webservice API

docker speech speech-recognition automatic-speech-recognition speech-to-text asr openai-whisper

Updated Feb 18, 2025
Python

coqui-ai / STT

🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.

deep-learning tensorflow voice-recognition speech-recognition automatic-speech-recognition speech-to-text stt asr speech-recognizer speech-recognition-api

Updated Mar 11, 2024
C++

pytorch-kaldi

mravanelli / pytorch-kaldi

pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.

deep-neural-networks deep-learning speech dnn pytorch recurrent-neural-networks lstm gru speech-recognition rnn kaldi rnn-model asr lstm-neural-networks multilayer-perceptron-network timit dnn-hmm

Updated Mar 14, 2022
Python

Improve this page

Add a description, image, and links to the asr topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the asr topic, visit your repo's landing page and select "manage topics."