A Re-Parameterized Vision Transformer (ReVT) for Domain-Generalized Semantic Segmentation

Jan-Aike Termöhlen, Timo Bartels, Tim Fingscheidt

2023-08-25Domain Generalization Segmentation Semantic Segmentation

Abstract

The task of semantic segmentation requires a model to assign semantic labels to each pixel of an image. However, the performance of such models degrades when deployed in an unseen domain with different data distributions compared to the training domain. We present a new augmentation-driven approach to domain generalization for semantic segmentation using a re-parameterized vision transformer (ReVT) with weight averaging of multiple models after training. We evaluate our approach on several benchmark datasets and achieve state-of-the-art mIoU performance of 47.3% (prior art: 46.3%) for small models and of 50.1% (prior art: 47.8%) for midsized models on commonly used benchmark datasets. At the same time, our method requires fewer parameters and reaches a higher frame rate than the best prior art. It is also easy to implement and, unlike network ensembles, does not add any computational complexity during inference.

Results

Task	Dataset	Metric	Value	Model
Domain Adaptation	GTA-to-Avg(Cityscapes,BDD,Mapillary)	mIoU	50.2	ReVT
Domain Generalization	GTA-to-Avg(Cityscapes,BDD,Mapillary)	mIoU	50.2	ReVT

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21 Simulate, Refocus and Ensemble: An Attention-Refocusing Scheme for Domain Generalization2025-07-17 GLAD: Generalizable Tuning for Vision-Language Models2025-07-17 MoTM: Towards a Foundation Model for Time Series Imputation based on Continuous Modeling2025-07-17 Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17 DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17 From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17 Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion2025-07-17