Qiang Hu1, Mei Liu2, Qiang Li1,†, Zhiwei Wang1, †
1 WNLO, HUST, 2 HUST Tongji Medical College
(†: corresponding author)
In this paper, we, for the first time, reduce the annotation cost to just a single frame per polyp video, regardless of the video's length. To this end, we introduce a new task, First-Frame Supervised Video Polyp Segmentation (FSVPS), and propose a novel Propagative and Semantic Dual-Teacher Network (PSDNet). Specifically, PSDNet adopts a teacher-student framework but employs two distinct types of teachers: the propagative teacher and the semantic teacher. The propagative teacher is a universal object tracker that propagates the first-frame annotation to subsequent frames as pseudo labels. However, tracking errors may accumulate over time, gradually degrading the pseudo labels and misguiding the student model. To address this, we introduce the semantic teacher, an exponential moving average of the student model, which produces more stable and time-invariant pseudo labels. PSDNet merges the pseudo labels from both teachers using a carefully-designed back-propagation strategy. This strategy assesses the quality of the pseudo labels by tracking them backward to the first frame. High-quality pseudo labels are more likely to spatially align with the firstframe annotation after this backward tracking, ensuring more accurate teacher-to-student knowledge transfer and improved segmentation performance.
Model | Backbone | Seen-Easy (Dice) | Seen-Hard (Dice) | Unseen-Easy (Dice) | Unseen-Hard (Dice) | Weights |
---|---|---|---|---|---|---|
PSDNet | PVT | 0.900 | 0.860 | 0.798 | 0.806 | ckpts |
Performance on SUN-SEG
- Python 3.8+
- PyTorch 1.9+
- TorchVision corresponding to the PyTorch version
- NVIDIA GPU + CUDA
cd PSDNet
# Install other dependent packages
pip install -r requirements.txt
# Install cuda extensions for FA
cd lib/ops_align
python setup.py build develop
cd ../..
Please refer to PNS+ to get access to the SUN-SEG dataset, and download it to path ./datasets
. The path structure should be as follows:
SALI
├── datasets
│ ├── SUN-SEG
│ │ ├── TestEasyDataset
│ │ │ ├── Seen
│ │ │ ├── Unseen
│ │ ├── TestHardDataset
│ │ │ ├── Seen
│ │ │ ├── Unseen
│ │ ├── TrainDataset
python test_video.py
Thanks XMem for the implementation of an efficient universal video object segmentaion, which is used as the propagative teacher model in this work.
If you find our paper and code useful in your research, please consider giving a star ⭐ and citation 📝 :
@article{hu2024first,
title={First-frame Supervised Video Polyp Segmentation via Propagative and Semantic Dual-teacher Network},
author={Hu, Qiang and Liu, Mei and Li, Qiang and Wang, Zhiwei},
journal={arXiv preprint arXiv:2412.16503},
year={2024}
}