Stars
InspireMusic: A Unified Framework for Music, Song, Audio Generation.
YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open
🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!
Official repository of the paper "MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization".
A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
HunyuanVideo: A Systematic Framework For Large Video Generation Model
Paper, Code and Resources for Speech Language Model and End2End Speech Dialogue System.
Awesome music generation model——MG²
Repository for training models for music source separation.
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
Utility functions for handling MIDI data in a nice/intuitive way.
multi-task and multi-track music transcription for everyone
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
⚡️HivisionIDPhotos: a lightweight and efficient AI ID photos tools. 一个轻量级的AI证件照制作算法。
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
MARS5 speech model (TTS) from CAMB.AI
A simple library for Fréchet Audio Distance (FAD) calculation