Skip to content

Latest commit

 

History

History
executable file
·
53 lines (39 loc) · 15.4 KB

File metadata and controls

executable file
·
53 lines (39 loc) · 15.4 KB

Audio-Modal-Models

Model / Methods Title Paper Link Code Link Published Keywords Venue
Whisper Robust Speech Recognition via Large-Scale Weak Supervision Paper
Reading
CodeStar 2022.12.06 openai
VALL-E Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers Paper
Reading
CodeStar 2023.01.05
VALOR VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset Paper
Reading
CodeStar 2023.04.17
VAST VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset Paper
Reading
CodeStar 2023.05.29
AudioPaLM AudioPaLM: A Large Language Model That Can Speak and Listen Paper
Reading
- 2023.06.22 google
SALMONN SALMONN: Towards Generic Hearing Abilities for Large Language Models Paper
Reading
CodeStar 2023.10.20 bytedance
SpeechGPT-Gen SpeechGPT-Gen: Scaling Chain-of-Information Speech Generation Paper
Reading
CodeStar 2024.01.24
SpeechVerse SpeechVerse: A Large-scale Generalizable Audio Language Model Paper
Reading
2024.05.14
SpeechGPT SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities Paper
Reading
CodeStar 2024.05.18 SpeechInstruct
video-SALMONN video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models Paper
Reading
CodeStar 2024.06.22 bytedance
Qwen2-Audio Qwen2-Audio Technical Report Paper
Reading
CodeStar 2024.07.15 alibaba
VITA Towards Open-Source Interactive Omni Multimodal LLM Paper
Reading
CodeStar 2024.08.09
Mini-Omni Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming Paper
Reading
CodeStar 2024.08.29 VoiceAssistant-400K
LLaMA-Omni LLaMA-Omni: Seamless Speech Interaction with Large Language Models Paper
Reading
CodeStar 2024.09.10 InstructS2S-200K

Zero-Shot Multi-Speaker TTS

Model / Methods Title Paper Link Code Link Published Keywords Venue
YourTTS YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone Paper
Reading
CodeStar 2021.12.04
MegaTTS Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias Paper
Reading
- 2023.06.06
MegaTTS2 Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis Paper
Reading
CodeStar 2023.07.14
XTTS XTTS: a Massively Multilingual Zero-Shot Text-to-Speech Model Paper
Reading
CodeStar 2023.06.07

TTS

Model / Methods Title Paper Link Code Link Published Keywords Venue
InstructTTS InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt Paper
Reading
CodeStar 2023.01.31

datasets

https://huggingface.co/datasets/ICTNLP/ComSpeech_Datasets

https://github.com/2noise/chattts

https://github.com/suno-ai/bark

https://github.com/openai/whisper