Skip to content
View hangtingchen's full-sized avatar

Block or report hangtingchen

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

InspireMusic: A Unified Framework for Music, Song, Audio Generation.

Python 839 71 Updated Feb 18, 2025

YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open

Python 3,899 424 Updated Feb 20, 2025

🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!

Python 11,576 1,211 Updated Feb 19, 2025

Official repository of the paper "MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization".

Python 133 6 Updated Jan 9, 2025

A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.

Python 1,999 95 Updated Jan 2, 2025

An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.

Python 2,262 164 Updated Feb 14, 2025

HunyuanVideo: A Systematic Framework For Large Video Generation Model

Python 8,658 702 Updated Feb 20, 2025
Jupyter Notebook 38 Updated Feb 10, 2025

Paper, Code and Resources for Speech Language Model and End2End Speech Dialogue System.

157 13 Updated Nov 10, 2024

Awesome music generation model——MG²

Python 137 10 Updated Feb 5, 2025

GLM-4-Voice | 端到端中英语音对话模型

Python 2,675 216 Updated Dec 5, 2024

Repository for training models for music source separation.

Python 630 86 Updated Feb 16, 2025

Local realtime voice AI

Python 2,230 122 Updated Feb 21, 2025

Target Speaker Extraction Toolkit

Python 144 16 Updated Feb 7, 2025

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.

Python 7,522 603 Updated Feb 21, 2025
Python 6 Updated Oct 4, 2024

Utility functions for handling MIDI data in a nice/intuitive way.

Jupyter Notebook 904 156 Updated Dec 11, 2024

multi-task and multi-track music transcription for everyone

130 3 Updated Nov 29, 2024

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 2,817 191 Updated Nov 14, 2024
Python 72 4 Updated Nov 22, 2024

⚡️HivisionIDPhotos: a lightweight and efficient AI ID photos tools. 一个轻量级的AI证件照制作算法。

Python 14,812 1,540 Updated Feb 21, 2025

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 3,158 273 Updated Nov 5, 2024

SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling

Python 1,026 73 Updated Jan 2, 2025

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Python 7,070 539 Updated Dec 25, 2024

Official implementation of VQ-Diffusion

Python 914 62 Updated Apr 17, 2024

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 2,603 159 Updated Feb 20, 2025

Spherical residual vector quantization (SRVQ)

Python 28 Updated Aug 25, 2024

MARS5 speech model (TTS) from CAMB.AI

Jupyter Notebook 2,620 215 Updated Aug 1, 2024

A simple library for Fréchet Audio Distance (FAD) calculation

Python 179 23 Updated Feb 11, 2025
HTML 1 1 Updated Nov 9, 2023
Next
Showing results