TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/The Many Faces of Robustness: A Critical Analysis of Out-o...

The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization

Dan Hendrycks, Steven Basart, Norman Mu, Saurav Kadavath, Frank Wang, Evan Dorundo, Rahul Desai, Tyler Zhu, Samyak Parajuli, Mike Guo, Dawn Song, Jacob Steinhardt, Justin Gilmer

2020-06-29ICCV 2021 10Data AugmentationDomain GeneralizationOut-of-Distribution Generalization
PaperPDFCode(official)

Abstract

We introduce four new real-world distribution shift datasets consisting of changes in image style, image blurriness, geographic location, camera operation, and more. With our new datasets, we take stock of previously proposed methods for improving out-of-distribution robustness and put them to the test. We find that using larger models and artificial data augmentations can improve robustness on real-world distribution shifts, contrary to claims in prior work. We find improvements in artificial robustness benchmarks can transfer to real-world distribution shifts, contrary to claims in prior work. Motivated by our observation that data augmentations can help with real-world distribution shifts, we also introduce a new data augmentation method which advances the state-of-the-art and outperforms models pretrained with 1000 times more labeled data. Overall we find that some methods consistently help with distribution shifts in texture and local image statistics, but these methods do not help with some other distribution shifts like geographic changes. Our results show that future research must study multiple distribution shifts simultaneously, as we demonstrate that no evaluated method consistently improves robustness.

Results

TaskDatasetMetricValueModel
Domain AdaptationImageNet-RTop-1 Error Rate53.2DeepAugment+AugMix (ResNet-50)
Domain AdaptationImageNet-RTop-1 Error Rate57.8DeepAugment (ResNet-50)
Domain AdaptationImageNet-Cmean Corruption Error (mCE)60.4DeepAugment (ResNet-50)
Domain AdaptationVizWiz-ClassificationAccuracy - All Images41.3ResNet-50 (deepaugment)
Domain AdaptationVizWiz-ClassificationAccuracy - Clean Images46ResNet-50 (deepaugment)
Domain AdaptationVizWiz-ClassificationAccuracy - Corrupted Images34.9ResNet-50 (deepaugment)
Domain AdaptationVizWiz-ClassificationAccuracy - All Images40.3ResNet-50 (deepaugment+augmix)
Domain AdaptationVizWiz-ClassificationAccuracy - Clean Images44.5ResNet-50 (deepaugment+augmix)
Domain AdaptationVizWiz-ClassificationAccuracy - Corrupted Images34.1ResNet-50 (deepaugment+augmix)
Domain GeneralizationImageNet-RTop-1 Error Rate53.2DeepAugment+AugMix (ResNet-50)
Domain GeneralizationImageNet-RTop-1 Error Rate57.8DeepAugment (ResNet-50)
Domain GeneralizationImageNet-Cmean Corruption Error (mCE)60.4DeepAugment (ResNet-50)
Domain GeneralizationVizWiz-ClassificationAccuracy - All Images41.3ResNet-50 (deepaugment)
Domain GeneralizationVizWiz-ClassificationAccuracy - Clean Images46ResNet-50 (deepaugment)
Domain GeneralizationVizWiz-ClassificationAccuracy - Corrupted Images34.9ResNet-50 (deepaugment)
Domain GeneralizationVizWiz-ClassificationAccuracy - All Images40.3ResNet-50 (deepaugment+augmix)
Domain GeneralizationVizWiz-ClassificationAccuracy - Clean Images44.5ResNet-50 (deepaugment+augmix)
Domain GeneralizationVizWiz-ClassificationAccuracy - Corrupted Images34.1ResNet-50 (deepaugment+augmix)

Related Papers

Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management2025-07-17Pixel Perfect MegaMed: A Megapixel-Scale Vision-Language Foundation Model for Generating High Resolution Medical Images2025-07-17Simulate, Refocus and Ensemble: An Attention-Refocusing Scheme for Domain Generalization2025-07-17GLAD: Generalizable Tuning for Vision-Language Models2025-07-17MoTM: Towards a Foundation Model for Time Series Imputation based on Continuous Modeling2025-07-17Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16InstructFLIP: Exploring Unified Vision-Language Model for Face Anti-spoofing2025-07-16Data Augmentation in Time Series Forecasting through Inverted Framework2025-07-15