TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Learning Content-enhanced Mask Transformer for Domain Gene...

Learning Content-enhanced Mask Transformer for Domain Generalized Urban-Scene Segmentation

Qi Bi, ShaoDi You, Theo Gevers

2023-07-01Source-Free Domain AdaptationScene SegmentationDomain GeneralizationSegmentationSemantic SegmentationSynthetic-to-Real TranslationDomain Adaptation
PaperPDFCode(official)

Abstract

Domain-generalized urban-scene semantic segmentation (USSS) aims to learn generalized semantic predictions across diverse urban-scene styles. Unlike domain gap challenges, USSS is unique in that the semantic categories are often similar in different urban scenes, while the styles can vary significantly due to changes in urban landscapes, weather conditions, lighting, and other factors. Existing approaches typically rely on convolutional neural networks (CNNs) to learn the content of urban scenes. In this paper, we propose a Content-enhanced Mask TransFormer (CMFormer) for domain-generalized USSS. The main idea is to enhance the focus of the fundamental component, the mask attention mechanism, in Transformer segmentation models on content information. To achieve this, we introduce a novel content-enhanced mask attention mechanism. It learns mask queries from both the image feature and its down-sampled counterpart, as lower-resolution image features usually contain more robust content information and are less sensitive to style variations. These features are fused into a Transformer decoder and integrated into a multi-resolution content-enhanced mask attention learning scheme. Extensive experiments conducted on various domain-generalized urban-scene segmentation datasets demonstrate that the proposed CMFormer significantly outperforms existing CNN-based methods for domain-generalized semantic segmentation, achieving improvements of up to 14.00\% in terms of mIoU (mean intersection over union). The source code is publicly available at \url{https://github.com/BiQiWHU/CMFormer}.

Results

TaskDatasetMetricValueModel
Image-to-Image TranslationGTAV-to-Cityscapes LabelsmIoU59.7CMFormer
Image-to-Image TranslationSYNTHIA-to-Cityscapes LabelsmIOU44.6CMFormer
Domain AdaptationCityscapes to ACDCmIoU60.1CMFormer
Domain AdaptationGTA-to-Avg(Cityscapes,BDD,Mapillary)mIoU51.1CMFormer
Domain AdaptationGTA5-to-CityscapesmIoU55.31CMFormer
Domain AdaptationCityscapes to ACDCmIoU60.1CMFormer
Image GenerationGTAV-to-Cityscapes LabelsmIoU59.7CMFormer
Image GenerationSYNTHIA-to-Cityscapes LabelsmIOU44.6CMFormer
Semantic SegmentationGTAV-to-Cityscapes LabelsmIoU55.3CMFormer
Domain GeneralizationGTA-to-Avg(Cityscapes,BDD,Mapillary)mIoU51.1CMFormer
Domain GeneralizationGTA5-to-CityscapesmIoU55.31CMFormer
10-shot image generationGTAV-to-Cityscapes LabelsmIoU55.3CMFormer
1 Image, 2*2 StitchingGTAV-to-Cityscapes LabelsmIoU59.7CMFormer
1 Image, 2*2 StitchingSYNTHIA-to-Cityscapes LabelsmIOU44.6CMFormer
Source-Free Domain AdaptationCityscapes to ACDCmIoU60.1CMFormer

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Simulate, Refocus and Ensemble: An Attention-Refocusing Scheme for Domain Generalization2025-07-17GLAD: Generalizable Tuning for Vision-Language Models2025-07-17MoTM: Towards a Foundation Model for Time Series Imputation based on Continuous Modeling2025-07-17Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion2025-07-17