Contrastive Learning and Self-Training for Unsupervised Domain Adaptation in Semantic Segmentation

Robert A. Marsden, Alexander Bartler, Mario Döbler, Bin Yang

2021-05-05Transfer Learning Semantic Segmentation Synthetic-to-Real Translation Contrastive Learning Unsupervised Domain Adaptation Domain Adaptation

Paper PDF

Abstract

Deep convolutional neural networks have considerably improved state-of-the-art results for semantic segmentation. Nevertheless, even modern architectures lack the ability to generalize well to a test dataset that originates from a different domain. To avoid the costly annotation of training data for unseen domains, unsupervised domain adaptation (UDA) attempts to provide efficient knowledge transfer from a labeled source domain to an unlabeled target domain. Previous work has mainly focused on minimizing the discrepancy between the two domains by using adversarial training or self-training. While adversarial training may fail to align the correct semantic categories as it minimizes the discrepancy between the global distributions, self-training raises the question of how to provide reliable pseudo-labels. To align the correct semantic categories across domains, we propose a contrastive learning approach that adapts category-wise centroids across domains. Furthermore, we extend our method with self-training, where we use a memory-efficient temporal ensemble to generate consistent and reliable pseudo-labels. Although both contrastive learning and self-training (CLST) through temporal ensembling enable knowledge transfer between two domains, it is their combination that leads to a symbiotic structure. We validate our approach on two domain adaptation benchmarks: GTA5 $\rightarrow$ Cityscapes and SYNTHIA $\rightarrow$ Cityscapes. Our method achieves better or comparable results than the state-of-the-art. We will make the code publicly available.

Results

Task	Dataset	Metric	Value	Model
Image-to-Image Translation	GTAV-to-Cityscapes Labels	mIoU	51.6	CLST
Image-to-Image Translation	SYNTHIA-to-Cityscapes	MIoU (13 classes)	57.8	CLST(ResNet-101)
Image-to-Image Translation	SYNTHIA-to-Cityscapes	MIoU (16 classes)	49.8	CLST(ResNet-101)
Image Generation	GTAV-to-Cityscapes Labels	mIoU	51.6	CLST
Image Generation	SYNTHIA-to-Cityscapes	MIoU (13 classes)	57.8	CLST(ResNet-101)
Image Generation	SYNTHIA-to-Cityscapes	MIoU (16 classes)	49.8	CLST(ResNet-101)
1 Image, 2*2 Stitching	GTAV-to-Cityscapes Labels	mIoU	51.6	CLST
1 Image, 2*2 Stitching	SYNTHIA-to-Cityscapes	MIoU (13 classes)	57.8	CLST(ResNet-101)
1 Image, 2*2 Stitching	SYNTHIA-to-Cityscapes	MIoU (16 classes)	49.8	CLST(ResNet-101)

Contrastive Learning and Self-Training for Unsupervised Domain Adaptation in Semantic Segmentation

Abstract

Results

Related Papers

Contrastive Learning and Self-Training for Unsupervised Domain Adaptation in Semantic Segmentation

Abstract

Results

Related Papers