TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Ensemble of Averages: Improving Model Selection and Boosti...

Ensemble of Averages: Improving Model Selection and Boosting Performance in Domain Generalization

Devansh Arpit, Huan Wang, Yingbo Zhou, Caiming Xiong

2021-10-21Domain GeneralizationModel Selection
PaperPDFCode(official)

Abstract

In Domain Generalization (DG) settings, models trained independently on a given set of training domains have notoriously chaotic performance on distribution shifted test domains, and stochasticity in optimization (e.g. seed) plays a big role. This makes deep learning models unreliable in real world settings. We first show that this chaotic behavior exists even along the training optimization trajectory of a single model, and propose a simple model averaging protocol that both significantly boosts domain generalization and diminishes the impact of stochasticity by improving the rank correlation between the in-domain validation accuracy and out-domain test accuracy, which is crucial for reliable early stopping. Taking advantage of our observation, we show that instead of ensembling unaveraged models (that is typical in practice), ensembling moving average models (EoA) from independent runs further boosts performance. We theoretically explain the boost in performance of ensembling and model averaging by adapting the well known Bias-Variance trade-off to the domain generalization setting. On the DomainBed benchmark, when using a pre-trained ResNet-50, this ensemble of averages achieves an average of $68.0\%$, beating vanilla ERM (w/o averaging/ensembling) by $\sim 4\%$, and when using a pre-trained RegNetY-16GF, achieves an average of $76.6\%$, beating vanilla ERM by $6\%$. Our code is available at \url{https://github.com/salesforce/ensemble-of-averages}.

Results

TaskDatasetMetricValueModel
Domain AdaptationPACSAverage Accuracy95.8Ensemble of Averages (RegNetY-16GF)
Domain AdaptationPACSAverage Accuracy93.2Ensemble of Averages (ResNeXt-50 32x4d)
Domain AdaptationPACSAverage Accuracy88.6Ensemble of Averages (ResNet-50)
Domain AdaptationOffice-HomeAverage Accuracy83.9Ensemble of Averages (RegNetY-16GF)
Domain AdaptationOffice-HomeAverage Accuracy80.2Ensemble of Averages (ResNeXt-50 32x4d)
Domain AdaptationOffice-HomeAverage Accuracy72.5Ensemble of Averages (ResNet-50)
Domain AdaptationDomainNetAverage Accuracy60.9Ensemble of Averages (RegNetY-16GF)
Domain AdaptationDomainNetAverage Accuracy54.6Ensemble of Averages (ResNeXt-50 32x4d)
Domain AdaptationDomainNetAverage Accuracy47.4Ensemble of Averages (ResNet-50)
Domain AdaptationVLCSAverage Accuracy81.1Ensemble of Averages (RegNetY-16GF)
Domain AdaptationVLCSAverage Accuracy80.4Ensemble of Averages (ResNeXt-50 32x4d)
Domain AdaptationVLCSAverage Accuracy79.1Ensemble of Averages (ResNet-50)
Domain AdaptationTerraIncognitaAverage Accuracy61.1Ensemble of Averages (RegNetY-16GF)
Domain AdaptationTerraIncognitaAverage Accuracy55.2Ensemble of Averages (ResNeXt-50 32x4d)
Domain AdaptationTerraIncognitaAverage Accuracy52.3Ensemble of Averages (ResNet-50)
Domain GeneralizationPACSAverage Accuracy95.8Ensemble of Averages (RegNetY-16GF)
Domain GeneralizationPACSAverage Accuracy93.2Ensemble of Averages (ResNeXt-50 32x4d)
Domain GeneralizationPACSAverage Accuracy88.6Ensemble of Averages (ResNet-50)
Domain GeneralizationOffice-HomeAverage Accuracy83.9Ensemble of Averages (RegNetY-16GF)
Domain GeneralizationOffice-HomeAverage Accuracy80.2Ensemble of Averages (ResNeXt-50 32x4d)
Domain GeneralizationOffice-HomeAverage Accuracy72.5Ensemble of Averages (ResNet-50)
Domain GeneralizationDomainNetAverage Accuracy60.9Ensemble of Averages (RegNetY-16GF)
Domain GeneralizationDomainNetAverage Accuracy54.6Ensemble of Averages (ResNeXt-50 32x4d)
Domain GeneralizationDomainNetAverage Accuracy47.4Ensemble of Averages (ResNet-50)
Domain GeneralizationVLCSAverage Accuracy81.1Ensemble of Averages (RegNetY-16GF)
Domain GeneralizationVLCSAverage Accuracy80.4Ensemble of Averages (ResNeXt-50 32x4d)
Domain GeneralizationVLCSAverage Accuracy79.1Ensemble of Averages (ResNet-50)
Domain GeneralizationTerraIncognitaAverage Accuracy61.1Ensemble of Averages (RegNetY-16GF)
Domain GeneralizationTerraIncognitaAverage Accuracy55.2Ensemble of Averages (ResNeXt-50 32x4d)
Domain GeneralizationTerraIncognitaAverage Accuracy52.3Ensemble of Averages (ResNet-50)

Related Papers

Simulate, Refocus and Ensemble: An Attention-Refocusing Scheme for Domain Generalization2025-07-17GLAD: Generalizable Tuning for Vision-Language Models2025-07-17MoTM: Towards a Foundation Model for Time Series Imputation based on Continuous Modeling2025-07-17InstructFLIP: Exploring Unified Vision-Language Model for Face Anti-spoofing2025-07-16From Physics to Foundation Models: A Review of AI-Driven Quantitative Remote Sensing Inversion2025-07-11Feed-Forward SceneDINO for Unsupervised Semantic Scene Completion2025-07-08Prompt-Free Conditional Diffusion for Multi-object Image Augmentation2025-07-08Integrated Structural Prompt Learning for Vision-Language Models2025-07-08