Open R1 Video

We introduce R1's paradigm to video understanding tasks and open-sourced the training code and data.

Note

Although our insights may not be guaranteed to be correct, we commit to sharing them truthfully and honestly. We welcome community feedback and discussions to improve our understanding on multimodal reasoning models.

News

[2025/02/18] We release training code and data of Open-R1-Video!

Our Findings

GRPO training that forces thinking can improve video understanding

We train Qwen2-VL-7B-Instruct on simple video dataset open-r1-video-4k using 4 x A100 (80G) GPUs, and the training only utilize video, query, and the ground truth answer (the letter of the correct answer). We only used GRPO (pure reinforcement learning without labeled reasoning trajectories) to train the model and achieved promising rewards during model training. We release our wandb logs for reference.

What We Did

Introduce R1 to Video-LMM (e.g., Qwen2-VL) based on huggingface/open-r1 and deepseek-ai/DeepSeek-R1.
Open-sourced the simple training data open-r1-video-4k.
- The simple reformat data is available in open-r1-video-4k.
- The video data is available in LLaVA-Video-large-swift.

Training Models

Note

The training commands below are configured for a node of 4 x A100 (80GB). For different hardware and topologies, you may need to tune the batch size and number of gradient accumulation steps.

Set up

git clone https://github.com/Wang-Xiaodong1899/Open-R1-Video.git
cd Open-R1-Video
conda create -n r1 python=3.10
conda activate r1
pip3 install -e ".[dev]"
pip3 install flash_attn --no-build-isolation
cd qwen-vl-utils
pip install -e .
cd ..

# download data and put in data/
wget https://huggingface.co/datasets/Xiaodong/open-r1-video-4k/resolve/main/LLaVA-Video-large-swift-origin.jsonl
# like: data/LLaVA-Video-large-swift-origin.jsonl

# download videos
git lfs install
git clone https://huggingface.co/datasets/malterei/LLaVA-Video-large-swift

GRPO on Qwen2-VL/7B

To run GRPO on Qwen2-VL-7B:

bash qwen-7b.sh

Please refer to qwen-7b.sh for more details.

Evaluating models

Video link

Sample responses:

On-going...

RL Data Reformat

We provide the easy reformat method to obtain the data for GRPO training, which only utilize video, query, and final answer. Please refer to format_video_data.py for more details.

Users can view data in open-r1-video-4k. The original question/original answer are from the original dataset.

References & Acknowledgements

We sincerely thank the contributions from the open source community, including the reproduction of DeepSeek, Open-R1, and R1-multimodal, etc.

The related projects are as follows:

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
assets		assets
qwen-vl-utils		qwen-vl-utils
scripts		scripts
src/open_r1_video		src/open_r1_video
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
qwen-7b.sh		qwen-7b.sh
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Open R1 Video

News

Our Findings

GRPO training that forces thinking can improve video understanding

Training Models

Set up

GRPO on Qwen2-VL/7B

Evaluating models

RL Data Reformat

References & Acknowledgements

About

Languages

License

Wang-Xiaodong1899/Open-R1-Video

Folders and files

Latest commit

History

Repository files navigation

Open R1 Video

News

Our Findings

GRPO training that forces thinking can improve video understanding

Training Models

Set up

GRPO on Qwen2-VL/7B

Evaluating models

RL Data Reformat

References & Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Languages