Paper: arXiv:2505.21904
A three-stage pipeline that leverages limited labels and abundant unlabeled data to train a compact student model via contrastive pixel-wise distillation.
CAST is built around three key steps:
-
Adaptation
Warm up the student on the small labeled set using standard instance-segmentation losses. -
Contrastive Self-Supervision
Pull pixel-level features of positive proposals closer in embedding space, while pushing negatives apart. -
Knowledge Distillation
Transfer instance-level predictions and refined feature representations from a large teacher to the compact student.
This design yields a student model that nearly matches its teacher on standard benchmarks despite using far fewer parameters and FLOPs.
| Statistics | Performance |
|---|---|
![]() |
![]() |
If you use this code or ideas from CAST, please cite:
@article{taghavi2025cast,
title = {CAST: Contrastive Adaptation and Distillation for Semi-Supervised Instance Segmentation},
author = {Taghavi, Pardis and Liu, Tian and Li, Renjie and Langari, Reza and Tu, Zhengzhong},
journal = {arXiv preprint arXiv:2505.21904},
year = {2025}
}

