Xiaokang Chen, Yuhui Yuan, Gang Zeng, Jingdong Wang
In this paper, we study the semi-supervised semantic segmentation problem via exploring both labeled data and extra unlabeled data. We propose a novel consistency regularization approach, called cross pseudo supervision (CPS). Our approach imposes the consistency on two segmentation networks perturbed with different initialization for the same input image. The pseudo one-hot label map, output from one perturbed segmentation network, is used to supervise the other segmentation network with the standard cross-entropy loss, and vice versa. The CPS consistency has two roles: encourage high similarity between the predictions of two perturbed networks for the same input image, and expand training data by using the unlabeled data with pseudo labels. Experiment results show that our approach achieves the state-of-the-art semi-supervised segmentation performance on Cityscapes and PASCAL VOC 2012. Code is available at https://git.io/CPS.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Semantic Segmentation | ScribbleKITTI | mIoU (1% Labels) | 33.7 | CPS (Range View) |
| Semantic Segmentation | ScribbleKITTI | mIoU (10% Labels) | 50 | CPS (Range View) |
| Semantic Segmentation | ScribbleKITTI | mIoU (20% Labels) | 52.8 | CPS (Range View) |
| Semantic Segmentation | ScribbleKITTI | mIoU (50% Labels) | 54.6 | CPS (Range View) |
| Semantic Segmentation | PASCAL VOC 2012 92 labeled | Validation mIoU | 64.1 | CPS (DeepLab v3+ with ResNet-101) |
| Semantic Segmentation | PASCAL VOC 2012 732 labeled | Validation mIoU | 75.9 | CPS (DeepLab v3+ with ResNet-101) |
| Semantic Segmentation | WoodScape | Mean IoU | 62.87 | CPS |
| Semantic Segmentation | PASCAL VOC 2012 366 labeled | Validation mIoU | 71.7 | CPS (DeepLab v3+ with ResNet-101) |
| Semantic Segmentation | Cityscapes 6.25% labeled | Validation mIoU | 69.8 | CPS (DeepLab v3+ with ResNet-101) |
| Semantic Segmentation | PASCAL VOC 2012 183 labeled | Validation mIoU | 67.4 | CPS (DeepLab v3+ with ResNet-101) |
| Semantic Segmentation | SemanticKITTI | mIoU (1% Labels) | 36.5 | CPS (Range View) |
| Semantic Segmentation | SemanticKITTI | mIoU (10% Labels) | 52.3 | CPS (Range View) |
| Semantic Segmentation | SemanticKITTI | mIoU (20% Labels) | 56.3 | CPS (Range View) |
| Semantic Segmentation | SemanticKITTI | mIoU (50% Labels) | 57.4 | CPS (Range View) |
| Semantic Segmentation | nuScenes | mIoU (1% Labels) | 40.7 | CPS (Range View) |
| Semantic Segmentation | nuScenes | mIoU (10% Labels) | 60.8 | CPS (Range View) |
| Semantic Segmentation | nuScenes | mIoU (20% Labels) | 64.9 | CPS (Range View) |
| Semantic Segmentation | nuScenes | mIoU (50% Labels) | 68 | CPS (Range View) |
| 10-shot image generation | ScribbleKITTI | mIoU (1% Labels) | 33.7 | CPS (Range View) |
| 10-shot image generation | ScribbleKITTI | mIoU (10% Labels) | 50 | CPS (Range View) |
| 10-shot image generation | ScribbleKITTI | mIoU (20% Labels) | 52.8 | CPS (Range View) |
| 10-shot image generation | ScribbleKITTI | mIoU (50% Labels) | 54.6 | CPS (Range View) |
| 10-shot image generation | PASCAL VOC 2012 92 labeled | Validation mIoU | 64.1 | CPS (DeepLab v3+ with ResNet-101) |
| 10-shot image generation | PASCAL VOC 2012 732 labeled | Validation mIoU | 75.9 | CPS (DeepLab v3+ with ResNet-101) |
| 10-shot image generation | WoodScape | Mean IoU | 62.87 | CPS |
| 10-shot image generation | PASCAL VOC 2012 366 labeled | Validation mIoU | 71.7 | CPS (DeepLab v3+ with ResNet-101) |
| 10-shot image generation | Cityscapes 6.25% labeled | Validation mIoU | 69.8 | CPS (DeepLab v3+ with ResNet-101) |
| 10-shot image generation | PASCAL VOC 2012 183 labeled | Validation mIoU | 67.4 | CPS (DeepLab v3+ with ResNet-101) |
| 10-shot image generation | SemanticKITTI | mIoU (1% Labels) | 36.5 | CPS (Range View) |
| 10-shot image generation | SemanticKITTI | mIoU (10% Labels) | 52.3 | CPS (Range View) |
| 10-shot image generation | SemanticKITTI | mIoU (20% Labels) | 56.3 | CPS (Range View) |
| 10-shot image generation | SemanticKITTI | mIoU (50% Labels) | 57.4 | CPS (Range View) |
| 10-shot image generation | nuScenes | mIoU (1% Labels) | 40.7 | CPS (Range View) |
| 10-shot image generation | nuScenes | mIoU (10% Labels) | 60.8 | CPS (Range View) |
| 10-shot image generation | nuScenes | mIoU (20% Labels) | 64.9 | CPS (Range View) |
| 10-shot image generation | nuScenes | mIoU (50% Labels) | 68 | CPS (Range View) |