TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/SWAD: Domain Generalization by Seeking Flat Minima

SWAD: Domain Generalization by Seeking Flat Minima

Junbum Cha, Sanghyuk Chun, Kyungjae Lee, Han-Cheol Cho, Seunghyun Park, Yunsung Lee, Sungrae Park

2021-02-17NeurIPS 2021 12Domain GeneralizationRobust classification
PaperPDFCodeCode(official)CodeCode

Abstract

Domain generalization (DG) methods aim to achieve generalizability to an unseen target domain by using only training data from the source domains. Although a variety of DG methods have been proposed, a recent study shows that under a fair evaluation protocol, called DomainBed, the simple empirical risk minimization (ERM) approach works comparable to or even outperforms previous methods. Unfortunately, simply solving ERM on a complex, non-convex loss function can easily lead to sub-optimal generalizability by seeking sharp minima. In this paper, we theoretically show that finding flat minima results in a smaller domain generalization gap. We also propose a simple yet effective method, named Stochastic Weight Averaging Densely (SWAD), to find flat minima. SWAD finds flatter minima and suffers less from overfitting than does the vanilla SWA by a dense and overfit-aware stochastic weight sampling strategy. SWAD shows state-of-the-art performances on five DG benchmarks, namely PACS, VLCS, OfficeHome, TerraIncognita, and DomainNet, with consistent and large margins of +1.6% averagely on out-of-domain accuracy. We also compare SWAD with conventional generalization methods, such as data augmentation and consistency regularization methods, to verify that the remarkable performance improvements are originated from by seeking flat minima, not from better in-domain generalizability. Last but not least, SWAD is readily adaptable to existing DG methods without modification; the combination of SWAD and an existing DG method further improves DG performances. Source code is available at https://github.com/khanrc/swad.

Results

TaskDatasetMetricValueModel
Domain AdaptationPACSAverage Accuracy88.1SWAD (ResNet-50)
Domain AdaptationOffice-HomeAverage Accuracy70.6SWAD (ResNet-50)
Domain AdaptationDomainNetAverage Accuracy46.5SWAD (ResNet-50)
Domain AdaptationVLCSAverage Accuracy79.1SWAD (ResNet-50)
Domain AdaptationTerraIncognitaAverage Accuracy50SWAD (ResNet-50)
Domain GeneralizationPACSAverage Accuracy88.1SWAD (ResNet-50)
Domain GeneralizationOffice-HomeAverage Accuracy70.6SWAD (ResNet-50)
Domain GeneralizationDomainNetAverage Accuracy46.5SWAD (ResNet-50)
Domain GeneralizationVLCSAverage Accuracy79.1SWAD (ResNet-50)
Domain GeneralizationTerraIncognitaAverage Accuracy50SWAD (ResNet-50)

Related Papers

Simulate, Refocus and Ensemble: An Attention-Refocusing Scheme for Domain Generalization2025-07-17GLAD: Generalizable Tuning for Vision-Language Models2025-07-17MoTM: Towards a Foundation Model for Time Series Imputation based on Continuous Modeling2025-07-17InstructFLIP: Exploring Unified Vision-Language Model for Face Anti-spoofing2025-07-16From Physics to Foundation Models: A Review of AI-Driven Quantitative Remote Sensing Inversion2025-07-11Feed-Forward SceneDINO for Unsupervised Semantic Scene Completion2025-07-08Prompt-Free Conditional Diffusion for Multi-object Image Augmentation2025-07-08Integrated Structural Prompt Learning for Vision-Language Models2025-07-08