Prevention of colorectal cancer has become a world health issue. In clinic practice, doctors usually use colonoscopy to detect polyps, but accurately segmenting polyps from colonoscopy images is a challenging task. To address this challenge, many CNN-based methods have been proposed. However, pure CNN-based methods have limitations. To overcome such limitations, we propose a novel architecture namely TransInvNet for accurate polyp segmentation in colonoscopy images. To be more specific, we combine the recently proposed involution network with vision transformer in two parallel branches, and then combine their output features. Based on the combined feature, we then use a simple decoder architecture with skip-connections to increase the resolution while decrease the channels step by step. Finally, we propose an attention segmentation module to combine attention map with reverse attention map together, which is able to help us distinguish polyp from its surrounding tissues and improve segmentation accuracy. Our method achieves great result on Kvasir dataset (mDice 0.910), and it also holds a good generalization ability on those unseen dataset (ETIS, CVC-ColonDB, Endoscene).
Figure 1: Architecture of proposed TransInvNet, which consists two parallel branches of RedNet and ViT respectively with a simple decoder.
Figure 2: Architecture of attention segmentation module.
Our train/test split policy follows PraNet: Parallel Reverse Attention Network for Polyp Segmentation. 900 images from Kvasir-SEG and 550 images from CVC-ClinicDB are used for training, while rest images of these 2 datasets and CVC-ColonDB, ETIS, test set of Endoscene are used for testing.
Figure 3: Quantitative results on Kvasir-SEG and CVC-ClinicDB datasets.
Figure 3: Quantitative results on ETIS, Endoscene and CVC-Colon datasets.
Figure 3: Qualitative results of our proposed TransInvNet compared to PraNet and HarDNet-MSEG.
.
├── cal_params.py
├── eval.py
├── images
│ ├── framework.png
│ ├── qualitiveresult.png
│ ├── quantitativeresult1.png
│ ├── quantitativeresult2.png
│ └── segmentationhead.png
├── inference.py
├── README.md
├── requirements.txt
├── train.py
├── TransInvNet
│ ├── model
│ │ ├── backbone
│ │ │ ├── base_backbone.py
│ │ │ ├── builder.py
│ │ │ └── rednet.py
│ │ ├── basic_blocks.py
│ │ ├── config.py
│ │ ├── decoder
│ │ │ └── decoder.py
│ │ ├── model.py
│ │ └── vit
│ │ └── vit.py
│ └── utils
│ ├── dataloader.py
│ ├── involution_cuda.py
│ └── utils.py
In our experiments, all training/testing are conducted using Pytorch with a single RTX2080 Ti GPU.
- Install required libraries:
pip install -r requirements.txt
- Download necessary data:
We use five datasets in our experiments: Kvasir-SEG, CVC-ClinicDB, CVC-ColonDB, ETIS and EndoScene. We use the same split policy as PraNet, and you can download these datasets from their repo. Thanks to their great work.- Download train dataset. This dataset can be downloaded from this link (Google Drive).
Configure your
train_path
to the directory of train dataset. - Download test dataset. This dataset can be downloaded from this link (Google Drive).
Configure your
test_path
to the directory of test dataset.
- Download train dataset. This dataset can be downloaded from this link (Google Drive).
Configure your
- Download Pretrained weights:
Download pretrained weights for ViT and RedNet. A large part of our code is from ViT-Pytorch and Involution. Thanks for their wonderful works.
pytron train.py --epoch --lr --batch_size --accmulation --img_size --clip --cfg --train_path --test_path --output_path --seed
For detailed information about each argument, please use python train.py --help
.
To inference images using our proposed TransInvNet, you can either download our pretrained weights from this link or
train one by yourself. After downloading pretrained weights of TransInvNet or finishing training, configure your weigh_path
to trained weights and
test_path
to images you would like to inference. Use this command to inference images.
python inference.py --img_size --weight_path --test_path --output_path --threshold
For detailed information about each argument, please use python inference.py --help
Our evaluation code is modified from link. To evaluation a model, you need to configure weight_path
to the trained weights and test_path
to the dataset you would like to evaluate.
You can use this command to run the evaluation script.
python eval.py --img_size --weight_path --test_path
For detailed information about each argument, please use python eval.py --help
- The code of Vision Transformer part is borrowed from ViT-Pytorch.
- The code of Involution part is borrowed from involution.
- Datasets used for experiments are from PraNet.