🍀 CLOVER

The official implementation of our NeurIPS 2024 paper:
Closed-Loop Visuomotor Control with Generative Expectation for Robotic Manipulation

Qingwen Bu, Jia Zeng, Li Chen, Yanchao Yang, Guyue Zhou, Junchi Yan, Ping Luo, Heming Cui, Yi Ma and Hongyang Li

📜 Preprint: 📌 Poster:

📬 If you have any questions, please feel free to contact: Qingwen Bu ( qwbu01@sjtu.edu.cn )

Full code and checkpoints release is coming soon. Please stay tuned.🦾

🔥 Highlight

🍀 CLOVER employs a text-conditioned video diffusion model for generating visual plans as reference inputs, then these sub-goals guide the feedback-driven policy to generate actions with an error measurement strategy.

Owing to the closed-loop attribute, CLOVER is robust to visual distraction and object variation:

This closed-loop mechanism enables achieving the desired states accurately and reliably, thereby facilitating the execution of long-term tasks:

cook-fish.mp4

📢 News

[2024/09/16] We released our paper on arXiv.
[2024/12/01] We have open sourced the entire codebase and will keep it updated, please give it a try!

📌 TODO list

Training script for visual planner
Checkpoints release (Scheduled Release Date: Mid-October, 2024)
Evaluation codes on CALVIN (Scheduled Release Date: Mid-October, 2024)
Policy training codes on CALVIN (Estimated Release Period: November, 2024)

🎮 Getting started

Our training are conducted with PyTorch 1.13.1, CUDA 11.7, Ubuntu 22.04, and NVIDIA Tesla A100 (80 GB). The closed-loop evaluation on CALVIN is run on a system with NVIDIA RTX 3090.

We did further testing with PyTorch 2.2.0 + CUDA 11.8, and the training also goes fine.

(Optional) We use conda to manage the environment.

conda create -n clover python=3.8
conda activate clover

Install dependencies.

pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 --extra-index-url https://download.pytorch.org/whl/cu117
pip install git+https://github.com/hassony2/torch_videovision
pip install -e .

Installation of CALVIN simulator.

git clone --recurse-submodules https://github.com/mees/calvin.git
export CALVIN_ROOT=$(pwd)/calvin
cd $CALVIN_ROOT
sh install.sh

💿 Checkpoints

We release model weights of our Visual Planner and Feedback-driven Policy at HuggingFace.

Training of Visual Planner

Requirement

The visual planner requires 24 GB GPU VRAM with a batch size of 4 (per GPU), video length of 8 and image size of 128.
Preparation
- We use OpenAI-CLIP to encode task instructions for conditioning.
Initiate training of the visual planner (video diffusion model) on CALVIN

Please modify accelerate_cfg.yaml first according to your setup.

accelerate launch --config_file accelerate_cfg.yaml train.py \
    --learning_rate 1e-4 \
    --train_num_steps 300000 \
    --save_and_sample_every 10000 \
    --train_batch_size 32 \
    --sample_per_seq 8 \
    --sampling_step 5 \
    --with_text_conditioning \
    --diffusion_steps 100 \
    --sample_steps 10 \
    --with_depth \
    --flow_reg \
    --results_folder *path_to_save_your_ckpts*

Training of Feedback Policy

Preparation
- We only support VC-1 as visual encoder for now, please setup environments and download pre-trained checkpoints according to eai-vc
- Set your calvin_dataset_path in FeedbackPolicy/train_calvin.sh
Initiate training of the Feedback-driven Policy (Inverse Dynamics Model) on CALVIN

cd ./FeedbackPolicy
bash train_calvin.sh

Evaluation

Preparation
1. Set your CALVIN and checkpoint path at FeedbackPolicy/eval_calvin.sh
2. We train our policy with input size of 192*192, please modify the config file correspondingly in VC-1 Config with img_size: 192 and use_cls: False.
Initiate evaluation on CALVIN simply with

cd ./FeedbackPolicy
bash eval_calvin.sh

📝 Citation

If you find the project helpful for your research, please consider citing our paper:

@article{bu2024clover,
  title={Closed-Loop Visuomotor Control with Generative Expectation for Robotic Manipulation},
  author={Bu, Qingwen and Zeng, Jia and Chen, Li and Yang, Yanchao and Zhou, Guyue and Yan, Junchi and Luo, Ping and Cui, Heming and Ma, Yi and Li, Hongyang},
  journal={arXiv preprint arXiv:2409.09016},
  year={2024}
}

Acknowledgements

We thank AVDC and RoboFlamingo for their open-sourced work!

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.github		.github
FeedbackPolicy		FeedbackPolicy
assets		assets
visual_planner		visual_planner
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🍀 CLOVER

🔥 Highlight

📢 News

📌 TODO list

🎮 Getting started

💿 Checkpoints

Training of Visual Planner

Requirement

Preparation

Initiate training of the visual planner (video diffusion model) on CALVIN

Training of Feedback Policy

Preparation

Initiate training of the Feedback-driven Policy (Inverse Dynamics Model) on CALVIN

Evaluation

Preparation

Initiate evaluation on CALVIN simply with

📝 Citation

Acknowledgements

About

Sponsor this project

Contributors 4

Languages

License

OpenDriveLab/CLOVER

Folders and files

Latest commit

History

Repository files navigation

🍀 CLOVER

🔥 Highlight

📢 News

📌 TODO list

🎮 Getting started

💿 Checkpoints

Training of Visual Planner

Requirement

Preparation

Initiate training of the visual planner (video diffusion model) on CALVIN

Training of Feedback Policy

Preparation

Initiate training of the Feedback-driven Policy (Inverse Dynamics Model) on CALVIN

Evaluation

Preparation

Initiate evaluation on CALVIN simply with

📝 Citation

Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Sponsor this project

Contributors 4

Languages