Official Implementation of "From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models"

Introduction

Our paper "From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models" has been accepted by CVPR 2024.

Installation

Creating conda environment and install pytorch

conda create -n pix2sgg python=3.8
conda activate pix2sgg

# CUDA 11.8
conda install pytorch==2.0.0 torchvision==0.15.0 pytorch-cuda=11.8 -c pytorch -c nvidia
# or CUDA 10.2
conda install pytorch==1.10.2 torchvision==0.11.3 cudatoolkit=10.2 -c pytorch

Install other dependencies:

pip install -r requirements_pix2sgg.txt
# the hugging face version: v4.29.2

Our work is built upon LAVIS, sharing the majority of its requirements.

Build Project

python setup.py build develop

Datasets

Check DATASET.md for instructions of dataset preprocessing.

Model Zoo

Open Vocabulary SGG

The model weight can be download from: https://huggingface.co/rj979797/PGSG-CVPR2024/tree/main

	Novel+base		Novel	checkpoint
Datasets	mR50/100	R50/100	mR50/100
VG	6.2/8.3	15.1/18.4	3.7/5.2	vg_ov_sgg.pth
VG-SGCls	9.7/13.8	26.8/33.2	5.1/7.7	vg_ov_sgg.pth
PSG	15.3/17.7	23.7/25.4	6.7/9.6	psg_ov_sgg.pth

Close Vocabulary SGG

Datasets	mR50/100	R50/100	checkpoint
VG	9.0/11.5	17.7/ 20.7	vg_sgg.pth
PSG	14.5/17.6	25.8/28.9	psg_sgg.pth
VG-c	10.4/12.7	20.3/23.6	vg_sgg_close_clser.pth
PSG-c	21.2/22.0	34.9/36.1	psg_sgg_close_clser.pth

Training and Evaluation

Our PGSG is trained using the BLIP pre-trained weights, accessible here.

Ensure that the checkpoint path in the configuration file (*.yaml) is accurate before training or evaluation. During training, utilize the checkpoint specified by model.pretrained, while for evaluation, load the checkpoint specified by model.finetuned.

VG dataset

Open Vocabulary SGG

Training

python -m torch.distributed.run --master_port 13919 --nproc_per_node=4 train.py  lavis/projects/blip/train/vrd_vg_ft_pgsg_ov.yaml --job-name VG-pgsg_ovsgg

Evaluation

python -m torch.distributed.run --master_port 13958 --nproc_per_node=4 evaluate.py --cfg-path lavis/projects/blip/eval/rel_det_vg_pgsg_eval_ov.yaml --job-name VG-pgsg_stdsgg-eval

Standard SGG

Training

python -m torch.distributed.run --master_port 13919 --nproc_per_node=4 train.py  lavis/projects/blip/train/vrd_vg_ft_pgsg.yaml --job-name VG-pgsg_ovsgg

Evaluation

python -m torch.distributed.run --master_port 13958 --nproc_per_node=4 evaluate.py --cfg-path lavis/projects/blip/eval/rel_det_vg_pgsg_eval.yaml --job-name VG-pgsg_stdsgg-eval

PSG dataset

Open Vocabulary SGG

Training

python -m torch.distributed.run --master_port 13919 --nproc_per_node=4 train.py --cfg-path lavis/projects/blip/train/vrd_psg_ft_pgsg_ov.yaml --job-name psg-pgsg_ovsgg

Evaluation

python -m torch.distributed.run --master_port 13958 --nproc_per_node=4 evaluate.py --cfg-path lavis/projects/blip/eval/rel_det_psg_ov.yaml --job-name psg-pgsg_ovsgg-eval

Standard SGG

Training

python -m torch.distributed.run --master_port 13919 --nproc_per_node=4 train.py --cfg-path lavis/projects/blip/train/vrd_psg_ft_pgsg.yaml --job-name psg-pgsg_stdsgg

Evaluation

python -m torch.distributed.run --master_port 13958 --nproc_per_node=4 evaluate.py --cfg-path lavis/projects/blip/eval/rel_det_psg_eval.yaml --job-name psg-pgsg_stdsgg-eval

Paper and Citing

If you find this project helps your research, please kindly consider citing our papers in your publications.

@misc{li2024pixels,
    title={From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models},
    author={Rongjie Li and Songyang Zhang and Dahua Lin and Kai Chen and Xuming He},
    year={2024},
    eprint={2404.00906},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

Acknowledge

This repository is built on LAVIS and borrows code from scene graph benchmarking framework from SGTR.

License

BSD 3-Clause License

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
all_categories_dict		all_categories_dict
app		app
assets		assets
dataset_card		dataset_card
docs		docs
examples		examples
lavis		lavis
projects		projects
run_scripts		run_scripts
tests/models		tests/models
.gitignore		.gitignore
CODEOWNERS		CODEOWNERS
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
DATASET.MD		DATASET.MD
LICENSE.txt		LICENSE.txt
MANIFEST.in		MANIFEST.in
README-lavis.md		README-lavis.md
README.md		README.md
SECURITY.md		SECURITY.md
evaluate.py		evaluate.py
requirements-dev.txt		requirements-dev.txt
requirements_pix2sgg.txt		requirements_pix2sgg.txt
setup.py		setup.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Official Implementation of "From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models"

Table of Contents

Introduction

Installation

Datasets

Model Zoo

Open Vocabulary SGG

Close Vocabulary SGG

Training and Evaluation

VG dataset

Open Vocabulary SGG

Standard SGG

PSG dataset

Open Vocabulary SGG

Standard SGG

Paper and Citing

Acknowledge

License

About

Releases

Packages

Languages

License

SHTUPLUS/Pix2Grp_CVPR2024

Folders and files

Latest commit

History

Repository files navigation

Official Implementation of "From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models"

Table of Contents

Introduction

Installation

Datasets

Model Zoo

Open Vocabulary SGG

Close Vocabulary SGG

Training and Evaluation

VG dataset

Open Vocabulary SGG

Standard SGG

PSG dataset

Open Vocabulary SGG

Standard SGG

Paper and Citing

Acknowledge

License

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages