Skip to content
/ LUCY Public

LUCY: Linguistic Understanding and Control Yielding Early Stage of Her

License

Notifications You must be signed in to change notification settings

VITA-MLLM/LUCY

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LUCY: Linguistic Understanding and Control Yielding Early Stage of Her

🔥 News

  • 2025.03.05 🌟 We have released the code and model checkpoint.
  • 2025.01.27 🌟 We are very proud to launch LUCY, an end-to-end fully duplex chatbot that supports voice emotion control, tool call, and natural conversation.

👀 LUCY Overview

We are excited to present LUCY, which incorporates a series of advancements:

  1. Semantic and acoustic emotion control
  2. Real-time tool call
  3. Human-like natural conversation

📈 Experimental Results

  • LUCY outperforms professional speech models on ASR benchmarks.

    Clipboard_Screenshot_1741176721

  • Emotion Control

    Clipboard_Screenshot_1741176602

  • Tool Call

    Clipboard_Screenshot_1741176644

  • Spoken QA

    Clipboard_Screenshot_1741176675

🔧Usage

Requirements and Installation

git clone https://github.com/VITA-MLLM/LUCY.git
cd LUCY
conda create -n lucy python=3.10 -y
conda activate lucy
pip install -r requirements.txt
pip install flash-attn --no-build-isolation

Data Preparation

  • An example json file of the training data:
[
    ...
    {
        "conversations": [
            {
                "role": "system",
                "content": '''You are an AI robot and your name is Lucy. \n
                - You are a multimodal large language model developed by Tencent. Your aim is to be helpful, honest and harmless. \n
                - You support the ability to communicate fluently and answer user questions in multiple languages of the user's choice. \n               
                - If the user corrects the wrong answer you generated, you will apologize and discuss the correct answer with the user.'''
            },
            {
                "role": "user",
                "content": "能帮我写一首现代诗吗?主题是爱情,用点隐喻和象征,再加点自然景物和情感的联系。",
                "wavpath": "mnt/data/hetinggao/Dataset/AudioQA/QA_1450K_question_tar_16khz/question_shuf_part_6/wav/0822474000bf7b3092b44bb226abeffc30f10c09f1_question.wav",
                "codec": "/mnt/data/hetinggao/Dataset/AudioQA/QA_1450K_answer_part1_tar_16khz/answer_part1_shuf_part_3/wav/00514770011b04aa28aa650a5499272436fa3198db_F10.snac"
            },
            {
                "role": "assistant",  
                "content": "松林里,爱情像薄雾,轻轻缠绕。就像清晨的第一缕阳光,在夜晚中无声绽放。秋叶落下,像时光的碎片,我们在微风中分享心事,就像大地和天空的约定。每一滴露珠里,都藏着一个吻。远山如梦,却触手可及,像你在我耳边低语,那声音就像海浪拍打着沙滩,诉说着永恒的爱。爱情像长河,流淌在月光下,每一束月光都闪耀着我们的回忆。无论风雨,我们都是彼此的星光,即使经历风雨,依然坚定。这爱像天地,大地的呼吸,天空的眸光,我们在这自然的旋律中,共舞、相伴。无论风云变幻,我们的心恒久共鸣。"
                "wavpath": "/mnt/data/hetinggao/Dataset/AudioQA/QA_1450K_answer_part1_tar_16khz/answer_part1_shuf_part_3/wav/00514770011b04aa28aa650a5499272436fa3198db_F10.wav",
                "codec": "/mnt/data/hetinggao/Dataset/AudioQA/QA_1450K_question_tar_16khz/question_shuf_part_6/wav/0822474000bf7b3092b44bb226abeffc30f10c09f1_question.snac",
                "speaker": "F10"
            }
        ]
    },
    ...
]

Training

Stage 1

  • Aligned audio encoder is available here.
  • Train your own aligned audio encoder, such as Whisper Encoder.
    bash run_scripts/s1.sh
    

Stage 2 & 3

  • Run the following scripts to continue training stage 2 and stage 3.
    bash run_scripts/s2p0.sh
    bash run_scripts/s2p1.sh
    bash run_scripts/s3.sh
    

📐 Inference

🤖 Demo

Demo of Emotion Control

emotion_control.mp4

Demo of Function Calls

function_call.mp4

Demo of Natural Conversation

natural_conv.mp4

✒️ Citation

If you find our work helpful for your research, please consider citing our work.

@article{gao2025lucy,
  title={LUCY: Linguistic Understanding and Control Yielding Early Stage of Her},
  author={Gao, Heting and Shao, Hang and Wang, Xiong and Qiu, Chaofan and Shen, Yunhang and Cai, Siqi and Shi, Yuchen and Xu, Zihan and Long, Zuwei and Zhang, Yike and others},
  journal={arXiv preprint arXiv:2501.16327},
  year={2025}
}

About

LUCY: Linguistic Understanding and Control Yielding Early Stage of Her

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published