We employ open-vocabulary segmentation to predict parts-and-whole labels based on the CAST segments. In this paper, we apply the OVSeg framework OVSeg, which predicts labels for masked images, except we did not fine-tune CLIP on these masked images.
We provide jupyter notebooks for predicting segmentation maps and conducting evaluations. We save the segmentations first and reuse them in subsequent evaluations.
- SAM
> pip install git+https://github.com/facebookresearch/segment-anything.git
- OVSeg. Follow the installation guide of OVSeg.
- Download the PartImageNet_OOD dataset from the github. Decompress the zip file and put them under
./data
./data/PartImageNet
|------ annotations/
| |------ val.json
| |------ train.json
| |------ test.json
|
|------ images/
|------ val/
|------ train/
|------ test/
- Save hierarchical segmentation:
- Visualize open-vocabulary segmentation:
- Evaluate open-vocabulary segmentation: