Skip to content

Code release for Proto-CLIP: Vision-Language Prototypical Network for Few-Shot Learning

License

Notifications You must be signed in to change notification settings

IRVLUTD/Proto-CLIP

Repository files navigation

Proto-CLIP: Vision-Language Prototypical Network for Few-Shot Learning

Code release for Proto-CLIP [ Arxiv | Project-Page ]

alt text

Dataset

  • To download the datasets, please follow the details in DATASET.md.
  • To download the FewSOL dataset variants [52 | 198], please use this link.
  • Note : Please make sure to place all the datasets in DATA/ directory.

Setup

# create conda environment
conda create -n proto-clip python=3.9

# activate the environment
conda activate proto-clip

# install dependencies
pip install -r requirements.txt

# Install the according versions of torch and torchvision
conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia

Alias

Run

CUDA_VISIBLE_DEVICES=<GPU_ID> \
python main.py \
--config <configs-file> \
--dataset <dataset-alias> \
--logs tb_logs \
--alpha <alpha> \
--beta <beta> \
--adapter <adapter-alias> \
<vl-flag> \
<test-flag>
  • config-file : Configuration file path for the experiment. Default config files are in configs/ directory.
  • dataset-alias : Alias of the dataset to be used for the experiment
  • alpha : alpha hyperparameter for the selected dataset
  • beta : beta hyperparameter for the selected dataset
  • adapter-alias : adapter alias for the experiment
  • vl-flag : To train text memory use "" else "--train_vis_memory_only"
  • test-flag : To train/test use ""/"--only_test".

Note: Please use main.qt.py for experiments involving Proto-CLIP-F-QT.

Tensorboard

tensorboard --logdir tb_logs

Proto-CLIP Toolkit

Demo: User command oriented (Fetch) robot grasping using Proto-CLIP predictions.
For the real world demo, please use proto-clip-toolkit (sample codes). Please check the pypi package here.
Please check the pretrained checkpoints to use/work with the proto-clip-toolkit.
NOTE: Use appropriate dataset w.r.t. the checkpoint.

Links

Contact

Following 3 options are available for any clarification, comments or suggestions

Citation

Please cite Proto-CLIP if it helps your research:

@INPROCEEDINGS{padalunkal2024protoclip,
  author={P, Jishnu Jaykumar and Palanisamy, Kamalesh and Chao, Yu-Wei and Du, Xinya and Xiang, Yu},
  title={{Proto-CLIP: Vision-Language Prototypical Network for Few-Shot Learning}}, 
  keywords={Training;Representation learning;Adaptation models;Three-dimensional displays;Prototypes;Benchmark testing;Object recognition;Few shot learning;Intelligent robots},
  booktitle={2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)}, 
  doi={10.1109/IROS58592.2024.10801660},
  pages={2594-2601},
  year={2024}
}