MIC: Masked Image Consistency for Context-Enhanced Domain Adaptation

Lukas Hoyer, Dengxin Dai, Haoran Wang, Luc van Gool

2022-12-02CVPR 2023 1Image Classification Semantic Segmentation Synthetic-to-Real Translation Unsupervised Domain Adaptation object-detection Object Detection Image-to-Image Translation Domain Adaptation

Paper PDF Code(official)

Abstract

In unsupervised domain adaptation (UDA), a model trained on source data (e.g. synthetic) is adapted to target data (e.g. real-world) without access to target annotation. Most previous UDA methods struggle with classes that have a similar visual appearance on the target domain as no ground truth is available to learn the slight appearance differences. To address this problem, we propose a Masked Image Consistency (MIC) module to enhance UDA by learning spatial context relations of the target domain as additional clues for robust visual recognition. MIC enforces the consistency between predictions of masked target images, where random patches are withheld, and pseudo-labels that are generated based on the complete image by an exponential moving average teacher. To minimize the consistency loss, the network has to learn to infer the predictions of the masked regions from their context. Due to its simple and universal concept, MIC can be integrated into various UDA methods across different visual recognition tasks such as image classification, semantic segmentation, and object detection. MIC significantly improves the state-of-the-art performance across the different recognition tasks for synthetic-to-real, day-to-nighttime, and clear-to-adverse-weather UDA. For instance, MIC achieves an unprecedented UDA performance of 75.9 mIoU and 92.8% on GTA-to-Cityscapes and VisDA-2017, respectively, which corresponds to an improvement of +2.1 and +3.0 percent points over the previous state of the art. The implementation is available at https://github.com/lhoyer/MIC.

Results

Task	Dataset	Metric	Value	Model
Image-to-Image Translation	Cityscapes-to-Foggy Cityscapes	mAP	47.6	MIC
Image-to-Image Translation	SYNTHIA-to-Cityscapes	mIoU (13 classes)	74	MIC
Image-to-Image Translation	GTAV-to-Cityscapes Labels	mIoU	75.9	MIC
Image-to-Image Translation	GTAV-to-Cityscapes Labels	mIoU	75.9	HRDA+MIC
Image-to-Image Translation	SYNTHIA-to-Cityscapes	MIoU (13 classes)	74	MIC
Image-to-Image Translation	SYNTHIA-to-Cityscapes	MIoU (16 classes)	67.3	MIC
Domain Adaptation	SYNTHIA-to-Cityscapes	mIoU	67.3	MIC
Domain Adaptation	GTA5 to Cityscapes	mIoU	75.9	MIC
Domain Adaptation	Cityscapes to ACDC	mIoU	70.4	MIC
Domain Adaptation	VisDA2017	Accuracy	92.8	MIC
Domain Adaptation	Office-Home	Accuracy	86.2	MIC
Domain Adaptation	GTAV-to-Cityscapes Labels	mIoU	75.9	MIC
Domain Adaptation	Cityscapes to Foggy Cityscapes	mAP@0.5	47.6	MIC
Domain Adaptation	SYNTHIA-to-Cityscapes	mIoU	67.3	MIC
Domain Adaptation	SYNTHIA-to-Cityscapes	mIoU (13 classes)	74	MIC
Image Generation	Cityscapes-to-Foggy Cityscapes	mAP	47.6	MIC
Image Generation	SYNTHIA-to-Cityscapes	mIoU (13 classes)	74	MIC
Image Generation	GTAV-to-Cityscapes Labels	mIoU	75.9	MIC
Image Generation	GTAV-to-Cityscapes Labels	mIoU	75.9	HRDA+MIC
Image Generation	SYNTHIA-to-Cityscapes	MIoU (13 classes)	74	MIC
Image Generation	SYNTHIA-to-Cityscapes	MIoU (16 classes)	67.3	MIC
Semantic Segmentation	Dark Zurich	mIoU	60.2	MIC
Semantic Segmentation	GTAV-to-Cityscapes Labels	mIoU	75.9	MIC
Semantic Segmentation	SYNTHIA-to-Cityscapes	Mean IoU	67.3	MIC
Unsupervised Domain Adaptation	GTAV-to-Cityscapes Labels	mIoU	75.9	MIC
Unsupervised Domain Adaptation	Cityscapes to Foggy Cityscapes	mAP@0.5	47.6	MIC
Unsupervised Domain Adaptation	SYNTHIA-to-Cityscapes	mIoU	67.3	MIC
Unsupervised Domain Adaptation	SYNTHIA-to-Cityscapes	mIoU (13 classes)	74	MIC
10-shot image generation	Dark Zurich	mIoU	60.2	MIC
10-shot image generation	GTAV-to-Cityscapes Labels	mIoU	75.9	MIC
10-shot image generation	SYNTHIA-to-Cityscapes	Mean IoU	67.3	MIC
1 Image, 2*2 Stitching	Cityscapes-to-Foggy Cityscapes	mAP	47.6	MIC
1 Image, 2*2 Stitching	SYNTHIA-to-Cityscapes	mIoU (13 classes)	74	MIC
1 Image, 2*2 Stitching	GTAV-to-Cityscapes Labels	mIoU	75.9	MIC
1 Image, 2*2 Stitching	GTAV-to-Cityscapes Labels	mIoU	75.9	HRDA+MIC
1 Image, 2*2 Stitching	SYNTHIA-to-Cityscapes	MIoU (13 classes)	74	MIC
1 Image, 2*2 Stitching	SYNTHIA-to-Cityscapes	MIoU (16 classes)	67.3	MIC

MIC: Masked Image Consistency for Context-Enhanced Domain Adaptation

Abstract

Results

Related Papers

MIC: Masked Image Consistency for Context-Enhanced Domain Adaptation

Abstract

Results

Related Papers