TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Strong but simple: A Baseline for Domain Generalized Dense...

Strong but simple: A Baseline for Domain Generalized Dense Perception by CLIP-based Transfer Learning

Christoph Hümmer, Manuel Schwonberg, Liangwei Zhou, Hu Cao, Alois Knoll, Hanno Gottschalk

2023-12-04Domain GeneralizationSegmentationTransfer LearningSemantic SegmentationRobust Object Detectionobject-detectionObject Detection
PaperPDFCode

Abstract

Domain generalization (DG) remains a significant challenge for perception based on deep neural networks (DNNs), where domain shifts occur due to synthetic data, lighting, weather, or location changes. Vision-language models (VLMs) marked a large step for the generalization capabilities and have been already applied to various tasks. Very recently, first approaches utilized VLMs for domain generalized segmentation and object detection and obtained strong generalization. However, all these approaches rely on complex modules, feature augmentation frameworks or additional models. Surprisingly and in contrast to that, we found that simple fine-tuning of vision-language pre-trained models yields competitive or even stronger generalization results while being extremely simple to apply. Moreover, we found that vision-language pre-training consistently provides better generalization than the previous standard of vision-only pre-training. This challenges the standard of using ImageNet-based transfer learning for domain generalization. Fully fine-tuning a vision-language pre-trained model is capable of reaching the domain generalization SOTA when training on the synthetic GTA5 dataset. Moreover, we confirm this observation for object detection on a novel synthetic-to-real benchmark. We further obtain superior generalization capabilities by reaching 77.9% mIoU on the popular Cityscapes-to-ACDC benchmark. We also found improved in-domain generalization, leading to an improved SOTA of 86.4% mIoU on the Cityscapes test set marking the first place on the leaderboard.

Results

TaskDatasetMetricValueModel
Domain AdaptationGTA-to-Avg(Cityscapes,BDD,Mapillary)mIoU63.5VLTSeg
Domain AdaptationGTA5-to-CityscapesmIoU65.6VLTSeg (EVA02-CLIP-L)
Semantic SegmentationCityscapes testMean IoU (class)86.4VLTSeg
Semantic SegmentationBDD100K valmIoU72.5VLTSeg
Object DetectionDWDmPC [AP50]36.9VLTDet
3DDWDmPC [AP50]36.9VLTDet
2D ClassificationDWDmPC [AP50]36.9VLTDet
2D Object DetectionDWDmPC [AP50]36.9VLTDet
Domain GeneralizationGTA-to-Avg(Cityscapes,BDD,Mapillary)mIoU63.5VLTSeg
Domain GeneralizationGTA5-to-CityscapesmIoU65.6VLTSeg (EVA02-CLIP-L)
10-shot image generationCityscapes testMean IoU (class)86.4VLTSeg
10-shot image generationBDD100K valmIoU72.5VLTSeg
16kDWDmPC [AP50]36.9VLTDet

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21RaMen: Multi-Strategy Multi-Modal Learning for Bundle Construction2025-07-18Simulate, Refocus and Ensemble: An Attention-Refocusing Scheme for Domain Generalization2025-07-17GLAD: Generalizable Tuning for Vision-Language Models2025-07-17MoTM: Towards a Foundation Model for Time Series Imputation based on Continuous Modeling2025-07-17Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17