TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Domain-independent Dominance of Adaptive Methods

Domain-independent Dominance of Adaptive Methods

Pedro Savarese, David Mcallester, Sudarshan Babu, Michael Maire

2019-12-04CVPR 2021 1Image ClassificationStochastic OptimizationLanguage Modelling
PaperPDFCode(official)

Abstract

From a simplified analysis of adaptive methods, we derive AvaGrad, a new optimizer which outperforms SGD on vision tasks when its adaptability is properly tuned. We observe that the power of our method is partially explained by a decoupling of learning rate and adaptability, greatly simplifying hyperparameter search. In light of this observation, we demonstrate that, against conventional wisdom, Adam can also outperform SGD on vision tasks, as long as the coupling between its learning rate and adaptability is taken into account. In practice, AvaGrad matches the best results, as measured by generalization accuracy, delivered by any existing optimizer (SGD or adaptive) across image classification (CIFAR, ImageNet) and character-level language modelling (Penn Treebank) tasks.

Results

TaskDatasetMetricValueModel
Stochastic OptimizationCIFAR-10 WRN-28-10 - 200 EpochsAccuracy96.36Adam (eps-adjusted)
Stochastic OptimizationCIFAR-10 WRN-28-10 - 200 EpochsAccuracy96.2AvaGrad
Stochastic OptimizationCIFAR-10 WRN-28-10 - 200 EpochsAccuracy96.14SGD
Stochastic OptimizationCIFAR-10 WRN-28-10 - 200 EpochsAccuracy95.92AdaShift
Stochastic OptimizationCIFAR-10 WRN-28-10 - 200 EpochsAccuracy95.89AdamW
Stochastic OptimizationCIFAR-10 WRN-28-10 - 200 EpochsAccuracy94.6AdaBound
Stochastic OptimizationImageNet ResNet-50 - 90 EpochsTop 1 Accuracy76.51AvaGrad
Stochastic OptimizationImageNet ResNet-50 - 90 EpochsTop 1 Accuracy75.99SGD
Stochastic OptimizationImageNet ResNet-50 - 90 EpochsTop 1 Accuracy72.9AdamW
Stochastic OptimizationImageNet ResNet-50 - 90 EpochsTop 1 Accuracy72.01AdaBound
Stochastic OptimizationPenn Treebank (Character Level) 3x1000 LSTM - 500 EpochsBit per Character (BPC)1.175AvaGrad
Stochastic OptimizationPenn Treebank (Character Level) 3x1000 LSTM - 500 EpochsBit per Character (BPC)1.23AdamW
Stochastic OptimizationPenn Treebank (Character Level) 3x1000 LSTM - 500 EpochsBit per Character (BPC)1.274AdaShift
Stochastic OptimizationPenn Treebank (Character Level) 3x1000 LSTM - 500 EpochsBit per Character (BPC)2.863AdaBound
Stochastic OptimizationCIFAR-100 WRN-28-10 - 200 EpochsAccuracy81.24AvaGrad
Stochastic OptimizationCIFAR-100 WRN-28-10 - 200 EpochsAccuracy81.12AdaShift
Stochastic OptimizationCIFAR-100 WRN-28-10 - 200 EpochsAccuracy81.04Adam (eps-adjusted)
Stochastic OptimizationCIFAR-100 WRN-28-10 - 200 EpochsAccuracy80.95SGD
Stochastic OptimizationCIFAR-100 WRN-28-10 - 200 EpochsAccuracy79.87AdamW
Stochastic OptimizationCIFAR-100 WRN-28-10 - 200 EpochsAccuracy77.24AdaBound

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21Automatic Classification and Segmentation of Tunnel Cracks Based on Deep Learning and Visual Explanations2025-07-18Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy2025-07-17Federated Learning for Commercial Image Sources2025-07-17MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17Making Language Model a Hierarchical Classifier and Generator2025-07-17VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17