Smoothing Matters: Momentum Transformer for Domain Adaptive Semantic Segmentation

Runfa Chen, Yu Rong, Shangmin Guo, Jiaqi Han, Fuchun Sun, Tingyang Xu, Wenbing Huang

2022-03-15Segmentation Semantic Segmentation Synthetic-to-Real Translation Unsupervised Domain Adaptation Image-to-Image Translation Domain Adaptation

Paper PDF Code(official)

Abstract

After the great success of Vision Transformer variants (ViTs) in computer vision, it has also demonstrated great potential in domain adaptive semantic segmentation. Unfortunately, straightforwardly applying local ViTs in domain adaptive semantic segmentation does not bring in expected improvement. We find that the pitfall of local ViTs is due to the severe high-frequency components generated during both the pseudo-label construction and features alignment for target domains. These high-frequency components make the training of local ViTs very unsmooth and hurt their transferability. In this paper, we introduce a low-pass filtering mechanism, momentum network, to smooth the learning dynamics of target domain features and pseudo labels. Furthermore, we propose a dynamic of discrepancy measurement to align the distributions in the source and target domains via dynamic weights to evaluate the importance of the samples. After tackling the above issues, extensive experiments on sim2real benchmarks show that the proposed method outperforms the state-of-the-art methods. Our codes are available at https://github.com/alpc91/TransDA

Results

Task	Dataset	Metric	Value	Model
Image-to-Image Translation	SYNTHIA-to-Cityscapes	mIoU (13 classes)	66.3	TransDA-B
Image-to-Image Translation	GTAV-to-Cityscapes Labels	mIoU	63.9	TransDA-B
Image-to-Image Translation	GTAV-to-Cityscapes Labels	mIoU	63.9	TransDA-B
Image-to-Image Translation	SYNTHIA-to-Cityscapes	MIoU (13 classes)	66.3	TransDA-B
Image-to-Image Translation	SYNTHIA-to-Cityscapes	MIoU (16 classes)	59.3	TransDA-B
Domain Adaptation	GTA5 to Cityscapes	mIoU	63.9	TransDA-B
Domain Adaptation	GTAV-to-Cityscapes Labels	mIoU	63.9	TransDA-B
Domain Adaptation	SYNTHIA-to-Cityscapes	mIoU (13 classes)	66.3	TransDA-B
Image Generation	SYNTHIA-to-Cityscapes	mIoU (13 classes)	66.3	TransDA-B
Image Generation	GTAV-to-Cityscapes Labels	mIoU	63.9	TransDA-B
Image Generation	GTAV-to-Cityscapes Labels	mIoU	63.9	TransDA-B
Image Generation	SYNTHIA-to-Cityscapes	MIoU (13 classes)	66.3	TransDA-B
Image Generation	SYNTHIA-to-Cityscapes	MIoU (16 classes)	59.3	TransDA-B
Semantic Segmentation	GTAV-to-Cityscapes Labels	mIoU	63.9	TransDA-B
Semantic Segmentation	SYNTHIA-to-Cityscapes	Mean IoU	59.3	TransDA-B
Unsupervised Domain Adaptation	GTAV-to-Cityscapes Labels	mIoU	63.9	TransDA-B
Unsupervised Domain Adaptation	SYNTHIA-to-Cityscapes	mIoU (13 classes)	66.3	TransDA-B
10-shot image generation	GTAV-to-Cityscapes Labels	mIoU	63.9	TransDA-B
10-shot image generation	SYNTHIA-to-Cityscapes	Mean IoU	59.3	TransDA-B
1 Image, 2*2 Stitching	SYNTHIA-to-Cityscapes	mIoU (13 classes)	66.3	TransDA-B
1 Image, 2*2 Stitching	GTAV-to-Cityscapes Labels	mIoU	63.9	TransDA-B
1 Image, 2*2 Stitching	GTAV-to-Cityscapes Labels	mIoU	63.9	TransDA-B
1 Image, 2*2 Stitching	SYNTHIA-to-Cityscapes	MIoU (13 classes)	66.3	TransDA-B
1 Image, 2*2 Stitching	SYNTHIA-to-Cityscapes	MIoU (16 classes)	59.3	TransDA-B

Smoothing Matters: Momentum Transformer for Domain Adaptive Semantic Segmentation

Abstract

Results

Related Papers

Smoothing Matters: Momentum Transformer for Domain Adaptive Semantic Segmentation

Abstract

Results

Related Papers