Binhui Xie, Shuang Li, Mingjia Li, Chi Harold Liu, Gao Huang, Guoren Wang
Domain adaptive semantic segmentation attempts to make satisfactory dense predictions on an unlabeled target domain by utilizing the supervised model trained on a labeled source domain. In this work, we propose Semantic-Guided Pixel Contrast (SePiCo), a novel one-stage adaptation framework that highlights the semantic concepts of individual pixels to promote learning of class-discriminative and class-balanced pixel representations across domains, eventually boosting the performance of self-training methods. Specifically, to explore proper semantic concepts, we first investigate a centroid-aware pixel contrast that employs the category centroids of the entire source domain or a single source image to guide the learning of discriminative features. Considering the possible lack of category diversity in semantic concepts, we then blaze a trail of distributional perspective to involve a sufficient quantity of instances, namely distribution-aware pixel contrast, in which we approximate the true distribution of each semantic category from the statistics of labeled source data. Moreover, such an optimization objective can derive a closed-form upper bound by implicitly involving an infinite number of (dis)similar pairs, making it computationally efficient. Extensive experiments show that SePiCo not only helps stabilize training but also yields discriminative representations, making significant progress on both synthetic-to-real and daytime-to-nighttime adaptation scenarios.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Image-to-Image Translation | SYNTHIA-to-Cityscapes | mIoU (13 classes) | 71.4 | SePiCo |
| Image-to-Image Translation | GTAV-to-Cityscapes Labels | mIoU | 70.3 | SePiCo |
| Image-to-Image Translation | GTAV-to-Cityscapes Labels | mIoU | 70.3 | SePiCo |
| Image-to-Image Translation | GTAV-to-Cityscapes Labels | mIoU | 61 | SePiCo - DeepLabv2 |
| Image-to-Image Translation | SYNTHIA-to-Cityscapes | MIoU (13 classes) | 71.4 | SePiCo |
| Image-to-Image Translation | SYNTHIA-to-Cityscapes | MIoU (16 classes) | 64.3 | SePiCo |
| Image-to-Image Translation | SYNTHIA-to-Cityscapes | MIoU (13 classes) | 66.5 | SePiCo (ResNet-101) |
| Image-to-Image Translation | SYNTHIA-to-Cityscapes | MIoU (16 classes) | 58.1 | SePiCo (ResNet-101) |
| Domain Adaptation | SYNTHIA-to-Cityscapes | mIoU | 64.3 | SePiCo |
| Domain Adaptation | SYNTHIA-to-Cityscapes | mIoU | 58.1 | SePiCo (DeepLabv2-ResNet-101) |
| Domain Adaptation | GTA5 to Cityscapes | mIoU | 70.3 | SePiCo |
| Domain Adaptation | GTAV-to-Cityscapes Labels | mIoU | 70.3 | SePiCo |
| Domain Adaptation | SYNTHIA-to-Cityscapes | mIoU (13 classes) | 71.4 | SePiCo |
| Domain Adaptation | SYNTHIA-to-Cityscapes | mIoU (13 classes) | 66.5 | SePiCo (DeepLabv2 ResNet-101) |
| Image Generation | SYNTHIA-to-Cityscapes | mIoU (13 classes) | 71.4 | SePiCo |
| Image Generation | GTAV-to-Cityscapes Labels | mIoU | 70.3 | SePiCo |
| Image Generation | GTAV-to-Cityscapes Labels | mIoU | 70.3 | SePiCo |
| Image Generation | GTAV-to-Cityscapes Labels | mIoU | 61 | SePiCo - DeepLabv2 |
| Image Generation | SYNTHIA-to-Cityscapes | MIoU (13 classes) | 71.4 | SePiCo |
| Image Generation | SYNTHIA-to-Cityscapes | MIoU (16 classes) | 64.3 | SePiCo |
| Image Generation | SYNTHIA-to-Cityscapes | MIoU (13 classes) | 66.5 | SePiCo (ResNet-101) |
| Image Generation | SYNTHIA-to-Cityscapes | MIoU (16 classes) | 58.1 | SePiCo (ResNet-101) |
| Semantic Segmentation | Dark Zurich | mIoU | 54.2 | SePiCo |
| Semantic Segmentation | Dark Zurich | mIoU | 45.4 | SePiCo (DeepLab v2 ResNet-101) |
| Semantic Segmentation | GTAV-to-Cityscapes Labels | mIoU | 70.3 | SePiCo |
| Semantic Segmentation | SYNTHIA-to-Cityscapes | Mean IoU | 64.3 | SePiCo |
| Unsupervised Domain Adaptation | GTAV-to-Cityscapes Labels | mIoU | 70.3 | SePiCo |
| Unsupervised Domain Adaptation | SYNTHIA-to-Cityscapes | mIoU (13 classes) | 71.4 | SePiCo |
| Unsupervised Domain Adaptation | SYNTHIA-to-Cityscapes | mIoU (13 classes) | 66.5 | SePiCo (DeepLabv2 ResNet-101) |
| 10-shot image generation | Dark Zurich | mIoU | 54.2 | SePiCo |
| 10-shot image generation | Dark Zurich | mIoU | 45.4 | SePiCo (DeepLab v2 ResNet-101) |
| 10-shot image generation | GTAV-to-Cityscapes Labels | mIoU | 70.3 | SePiCo |
| 10-shot image generation | SYNTHIA-to-Cityscapes | Mean IoU | 64.3 | SePiCo |
| 1 Image, 2*2 Stitching | SYNTHIA-to-Cityscapes | mIoU (13 classes) | 71.4 | SePiCo |
| 1 Image, 2*2 Stitching | GTAV-to-Cityscapes Labels | mIoU | 70.3 | SePiCo |
| 1 Image, 2*2 Stitching | GTAV-to-Cityscapes Labels | mIoU | 70.3 | SePiCo |
| 1 Image, 2*2 Stitching | GTAV-to-Cityscapes Labels | mIoU | 61 | SePiCo - DeepLabv2 |
| 1 Image, 2*2 Stitching | SYNTHIA-to-Cityscapes | MIoU (13 classes) | 71.4 | SePiCo |
| 1 Image, 2*2 Stitching | SYNTHIA-to-Cityscapes | MIoU (16 classes) | 64.3 | SePiCo |
| 1 Image, 2*2 Stitching | SYNTHIA-to-Cityscapes | MIoU (13 classes) | 66.5 | SePiCo (ResNet-101) |
| 1 Image, 2*2 Stitching | SYNTHIA-to-Cityscapes | MIoU (16 classes) | 58.1 | SePiCo (ResNet-101) |