Skip to content

Latest commit

 

History

History
50 lines (38 loc) · 1.79 KB

GETTING_STARTED_SELF.md

File metadata and controls

50 lines (38 loc) · 1.79 KB

Getting started with self-supervised learning of CAST

There are two architectural variants of CAST proposed in the paper. One is for image-classification and another is for segmentation. Both architectures can be pre-trained using self-supervised learning. In the paper, we used MoCo-v3 framework for all self-supervised learning experiments.

We provide the bashscripts for running self-supervised experiments. By default, we use CAST-S. You can use larger models, e.g. CAST-B by replacing -a cast_small with -a cast_base in the bashscripts.

Model architecture

(a) CAST for classification

input image

(b) CAST for segmentation

input image

Pre-train on ImageNet for classification

  1. Self-supervised learning of CAST on ImageNet-1K:
> bash scripts/moco/train_imagenet1k_cast.sh
  1. Self-supervised learning of CAST on ImageNet-100:
> bash scripts/moco/train_imagenet100_cast.sh
  1. Self-supervised learning of ViT on ImageNet-1K:
> bash scripts/moco/train_imagenet1k.sh
  1. Self-supervised learning of ViT on ImageNet-100:
> bash scripts/moco/train_imagenet100.sh
  1. In the paper, we ablate the efficacy of our Graph Pooling module by replacing it with the Token Merging module. Both models use superpixel tokens. Run the following bashscript to reproduce our ablation study of Token Merging module on ImageNet-100:
> bash scripts/moco/train_imagenet100_tome.sh

Pre-train on COCO for segmentation

  1. Self-supervised learning of CAST on COCO:
> bash scripts/moco/train_coco_cast.sh