Luke Melas-Kyriazi, Arjun K. Manrai
Unsupervised domain adaptation is a promising technique for semantic segmentation and other computer vision tasks for which large-scale data annotation is costly and time-consuming. In semantic segmentation, it is attractive to train models on annotated images from a simulated (source) domain and deploy them on real (target) domains. In this work, we present a novel framework for unsupervised domain adaptation based on the notion of target-domain consistency training. Intuitively, our work is based on the idea that in order to perform well on the target domain, a model's output should be consistent with respect to small perturbations of inputs in the target domain. Specifically, we introduce a new loss term to enforce pixelwise consistency between the model's predictions on a target image and a perturbed version of the same image. In comparison to popular adversarial adaptation methods, our approach is simpler, easier to implement, and more memory-efficient during training. Experiments and extensive ablation studies demonstrate that our simple approach achieves remarkably strong results on two challenging synthetic-to-real benchmarks, GTA5-to-Cityscapes and SYNTHIA-to-Cityscapes. Code is available at: https://github.com/lukemelas/pixmatch
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Image-to-Image Translation | GTAV-to-Cityscapes Labels | mIoU | 50.3 | PixMatch |
| Image-to-Image Translation | SYNTHIA-to-Cityscapes | MIoU (13 classes) | 54.5 | PixMatch(ResNet-101) |
| Image-to-Image Translation | SYNTHIA-to-Cityscapes | MIoU (16 classes) | 46.1 | PixMatch(ResNet-101) |
| Image Generation | GTAV-to-Cityscapes Labels | mIoU | 50.3 | PixMatch |
| Image Generation | SYNTHIA-to-Cityscapes | MIoU (13 classes) | 54.5 | PixMatch(ResNet-101) |
| Image Generation | SYNTHIA-to-Cityscapes | MIoU (16 classes) | 46.1 | PixMatch(ResNet-101) |
| 1 Image, 2*2 Stitching | GTAV-to-Cityscapes Labels | mIoU | 50.3 | PixMatch |
| 1 Image, 2*2 Stitching | SYNTHIA-to-Cityscapes | MIoU (13 classes) | 54.5 | PixMatch(ResNet-101) |
| 1 Image, 2*2 Stitching | SYNTHIA-to-Cityscapes | MIoU (16 classes) | 46.1 | PixMatch(ResNet-101) |