Skip to content

GraspLDM: Generative 6-DoF Grasp Synthesis using Latent Diffusion Models

License

Notifications You must be signed in to change notification settings

kuldeepbrd1/graspLDM

Repository files navigation

GraspLDM: Generative 6-DoF Grasp Synthesis using Latent Diffusion Models

Kuldeep Barad·Andrej Orsula·Antoine Richard·Jan Dentler·Miguel Olivares-Mendez·Carol Martinez

ArXiv   |   Video

Vision-based grasping of unknown objects in unstructured environments is a key challenge for autonomous robotic manipulation. A practical grasp synthesis system is required to generate a diverse set of 6-DoF grasps from which a task-relevant grasp can be executed. Although generative models are suitable for learning such complex data distributions, existing models have limitations in grasp quality, long training times, and a lack of flexibility for task-specific generation. In this work, we present GraspLDM, a modular generative framework for 6-DoF grasp synthesis that uses diffusion models as priors in the latent space of a VAE. GraspLDM learns a generative model of object-centric SE(3) grasp poses conditioned on point clouds. GraspLDM's architecture enables us to train task-specific models efficiently by only re-training a small denoising network in the low-dimensional latent space, as opposed to existing models that need expensive re-training. Our framework provides robust and scalable models on both full and partial point clouds. GraspLDM models trained with simulation data transfer well to the real world without any further fine-tuning. Our models provide an 80% success rate for 80 grasp attempts of diverse test objects across two real-world robotic setups.

Pre-requisites

  1. Python >= 3.8
  2. CUDA > 11.1 and compatible Nvidia driver
  3. (Only for Docker) Nvidia container toolkit

Setup

You can setup a python environment using Conda or virtualenv. Alternatively, to avoid issues with system libraries, you can use a Docker container or a VSCode Devcontainer.

  1. Conda

    conda create env -f environment.yml
    conda activate grasp_ldm
    
  2. virtualenv

    python -m venv grasp_ldm
    source grasp_ldm/bin/activate
    pip install -r requirements.txt
    
  3. Docker

    • Use the helper scripts to build a docker image and run the container.

    NOTE: Executing bash scripts may not always be safe. Double check before executing.

    cd .docker
    chmod +x build.sh run.sh
    
    # Build the image
    ./build.sh
    
    # Run a container
    ./run.sh
    
  4. Devcontainer

    • Use the editor commands (Ctrl+Shft+P) and start typing Dev Containers: Reopen in Container and select.

    • Generally, use Dev Containers: Reopen in Container to start the devcontainer. When you wish to rebuild after change use Dev Containers: Rebuild and Reopen ion Container.

    • For more info on Devcontainers, refer to : ...

Prepare Data

  1. Download the ACRONYM dataset using the instructions given in nvlabs/acronym.

  2. Download the train/test splits data from the 🤗 HuggingFace kuldeepbarad/GraspLDM/splits

Run Generation Demo on ShapeNet Point Clouds

  1. Download the pretrained models from 🤗 HuggingFace repository kuldeepbarad/GraspLDM.

  2. Run the demo script using pretrained model:

    python tools/generate_grasps.py --exp_path <path-to-experiment-folder> --mode VAE --visualize
    
    # Example
    python tools/generate_grasps.py --exp_path checkpoints/generation/fpc_1a_latentc3_z4_pc64_simple_140k_noatt --mode VAE --visualize
    All options
    • --exp_path: Path to the experiment checkpoint
      python generate_grasps.py --exp_path checkpoints/generation/fpc_1a_latentc3_z4_pc64_simple_140k_noatt
    • --data_root: Root directory for data (default: "data/ACRONYM")
    • --mode: Model type to use, either 'VAE' or 'LDM' (default: 'VAE')
    • --split: Data split to use (default: "test")
    • --num_grasps: Number of grasps to generate (default: 20)
    • --visualize: Enable visualization
    • --no_ema: Disable EMA model usage
    • --num_samples: Number of samples to generate (default: 11)
    • --inference_steps: Number of inference steps for LDM (default: 100)

Run Training on ACRONYM Dataset

Train grasp sampling models (VAE, DDM) with multi-GPU support.

NOTE: The training is done in two stages. First the VAE encoders are trained and then the latent space denoising diffusion model.

# Basic usage
## 1. First train the VAE
python tools/train_generator.py --config configs/generation/fpc/fpc_1a_latentc3_z4_pc64_180k.py --model vae
## 2. Then train the DDM once VAE checkpoints are available.
python tools/train_generator.py --config configs/generation/fpc/fpc_1a_latentc3_z4_pc64_180k.py --model ddm

Optional usage examples:

# Multi-GPU training
python tools/train_generator.py --config configs/generation/fpc/fpc_1a_latentc3_z4_pc64_180k.py --model vae --num-gpus 4 --batch-size 32

# DDM training - NOTE: DDM training can only be done once the VAE model for this experiment has been trained
python tools/train_generator.py --config configs/generation/fpc/fpc_1a_latentc3_z4_pc64_180k.py --model ddm --seed 42
All options
  • --config, -c: Path to config file
  • --model, -m: Model type (classifier, vae, ddm)
  • --root-dir, -d: Data root directory
  • --num-gpus, -g: Number of GPUs
  • --batch-size, -b: Batch size per device
  • --deterministic: Enable deterministic training
  • --seed: Random seed
  • -debug: Disable wandb logging

Attribution

If you find this code useful, please cite our work:

@article{barad2023graspldm,
  title={GraspLDM: Generative 6-DoF Grasp Synthesis using Latent Diffusion Models},
  author={Barad, Kuldeep R and Orsula, Andrej and Richard, Antoine and Dentler, Jan and Olivares-Mendez, Miguel and Martinez, Carol},
  journal={arXiv preprint arXiv:2312.11243},
  year={2023}
}

License

Apache 2.0 License. See LICENSE for more details.

Acknowledgements/External Resources

About

GraspLDM: Generative 6-DoF Grasp Synthesis using Latent Diffusion Models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published