TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Model soups: averaging weights of multiple fine-tuned mode...

Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time

Mitchell Wortsman, Gabriel Ilharco, Samir Yitzhak Gadre, Rebecca Roelofs, Raphael Gontijo-Lopes, Ari S. Morcos, Hongseok Namkoong, Ali Farhadi, Yair Carmon, Simon Kornblith, Ludwig Schmidt

2022-03-10Image ClassificationDomain GeneralizationUnsupervised Domain AdaptationOut-of-Distribution Generalization
PaperPDFCodeCodeCodeCodeCodeCode(official)

Abstract

The conventional recipe for maximizing model accuracy is to (1) train multiple models with various hyperparameters and (2) pick the individual model which performs best on a held-out validation set, discarding the remainder. In this paper, we revisit the second step of this procedure in the context of fine-tuning large pre-trained models, where fine-tuned models often appear to lie in a single low error basin. We show that averaging the weights of multiple models fine-tuned with different hyperparameter configurations often improves accuracy and robustness. Unlike a conventional ensemble, we may average many models without incurring any additional inference or memory costs -- we call the results "model soups." When fine-tuning large pre-trained models such as CLIP, ALIGN, and a ViT-G pre-trained on JFT, our soup recipe provides significant improvements over the best model in a hyperparameter sweep on ImageNet. The resulting ViT-G model, which attains 90.94% top-1 accuracy on ImageNet, achieved a new state of the art. Furthermore, we show that the model soup approach extends to multiple image classification and natural language processing tasks, improves out-of-distribution performance, and improves zero-shot performance on new downstream tasks. Finally, we analytically relate the performance similarity of weight-averaging and logit-ensembling to flatness of the loss and confidence of the predictions, and validate this relation empirically. Code is available at https://github.com/mlfoundations/model-soups.

Results

TaskDatasetMetricValueModel
Domain AdaptationImageNet-RTop 1 Error4.54Model soups (ViT-G/14)
Domain AdaptationImageNet-RTop-1 Error Rate3.9Model soups (BASIC-L)
Domain AdaptationImageNet-RTop-1 Error Rate4.54Model soups (ViT-G/14)
Domain AdaptationImageNet-ATop-1 accuracy %94.17Model soups (BASIC-L)
Domain AdaptationImageNet-ATop-1 accuracy %92.67Model soups (ViT-G/14)
Domain AdaptationImageNet-SketchTop-1 accuracy77.18Model soups (BASIC-L)
Domain AdaptationImageNet-SketchTop-1 accuracy74.24Model soups (ViT-G/14)
Image ClassificationImageNet V2Top 1 Accuracy84.63Model soups (BASIC-L)
Image ClassificationImageNet V2Top 1 Accuracy84.22Model soups (ViT-G/14)
Image ClassificationObjectNetTop-1 Accuracy79.03Baseline (ViT-G/14)
Image ClassificationObjectNetTop-1 Accuracy78.52Model soups (ViT-G/14)
Unsupervised Domain AdaptationImageNet-RTop 1 Error4.54Model soups (ViT-G/14)
Domain GeneralizationImageNet-RTop-1 Error Rate3.9Model soups (BASIC-L)
Domain GeneralizationImageNet-RTop-1 Error Rate4.54Model soups (ViT-G/14)
Domain GeneralizationImageNet-ATop-1 accuracy %94.17Model soups (BASIC-L)
Domain GeneralizationImageNet-ATop-1 accuracy %92.67Model soups (ViT-G/14)
Domain GeneralizationImageNet-SketchTop-1 accuracy77.18Model soups (BASIC-L)
Domain GeneralizationImageNet-SketchTop-1 accuracy74.24Model soups (ViT-G/14)

Related Papers

Automatic Classification and Segmentation of Tunnel Cracks Based on Deep Learning and Visual Explanations2025-07-18Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy2025-07-17Federated Learning for Commercial Image Sources2025-07-17MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17Simulate, Refocus and Ensemble: An Attention-Refocusing Scheme for Domain Generalization2025-07-17GLAD: Generalizable Tuning for Vision-Language Models2025-07-17MoTM: Towards a Foundation Model for Time Series Imputation based on Continuous Modeling2025-07-17