Sudhanshu Mittal, Maxim Tatarchenko, Thomas Brox
The ability to understand visual information from limited labeled data is an important aspect of machine learning. While image-level classification has been extensively studied in a semi-supervised setting, dense pixel-level classification with limited data has only drawn attention recently. In this work, we propose an approach for semi-supervised semantic segmentation that learns from limited pixel-wise annotated samples while exploiting additional annotation-free images. It uses two network branches that link semi-supervised classification with semi-supervised segmentation including self-training. The dual-branch approach reduces both the low-level and the high-level artifacts typical when training with few labels. The approach attains significant improvement over existing methods, especially when trained with very few labeled samples. On several standard benchmarks - PASCAL VOC 2012, PASCAL-Context, and Cityscapes - the approach achieves new state-of-the-art in semi-supervised learning.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Semantic Segmentation | PASCAL Context 12.5% labeled | Validation mIoU | 35.3 | s4GAN+MLMT (DeepLab v2 ImageNet pre-trained) |
| Semantic Segmentation | PASCAL Context 25% labeled | Validation mIoU | 37.8 | s4GAN+MLMT (DeepLab v2 ImageNet pre-trained) |
| 10-shot image generation | PASCAL Context 12.5% labeled | Validation mIoU | 35.3 | s4GAN+MLMT (DeepLab v2 ImageNet pre-trained) |
| 10-shot image generation | PASCAL Context 25% labeled | Validation mIoU | 37.8 | s4GAN+MLMT (DeepLab v2 ImageNet pre-trained) |