TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Pyramid Adversarial Training Improves ViT Performance

Pyramid Adversarial Training Improves ViT Performance

Charles Herrmann, Kyle Sargent, Lu Jiang, Ramin Zabih, Huiwen Chang, Ce Liu, Dilip Krishnan, Deqing Sun

2021-11-30CVPR 2022 1Image ClassificationData AugmentationDomain GeneralizationAdversarial Attack
PaperPDFCode(official)

Abstract

Aggressive data augmentation is a key component of the strong generalization capabilities of Vision Transformer (ViT). One such data augmentation technique is adversarial training (AT); however, many prior works have shown that this often results in poor clean accuracy. In this work, we present pyramid adversarial training (PyramidAT), a simple and effective technique to improve ViT's overall performance. We pair it with a "matched" Dropout and stochastic depth regularization, which adopts the same Dropout and stochastic depth configuration for the clean and adversarial samples. Similar to the improvements on CNNs by AdvProp (not directly applicable to ViT), our pyramid adversarial training breaks the trade-off between in-distribution accuracy and out-of-distribution robustness for ViT and related architectures. It leads to 1.82% absolute improvement on ImageNet clean accuracy for the ViT-B model when trained only on ImageNet-1K data, while simultaneously boosting performance on 7 ImageNet robustness metrics, by absolute numbers ranging from 1.76% to 15.68%. We set a new state-of-the-art for ImageNet-C (41.42 mCE), ImageNet-R (53.92%), and ImageNet-Sketch (41.04%) without extra data, using only the ViT-B/16 backbone and our pyramid adversarial training. Our code is publicly available at pyramidat.github.io.

Results

TaskDatasetMetricValueModel
Domain AdaptationImageNet-RTop-1 Error Rate42.16Pyramid Adversarial Training Improves ViT (Im21k)
Domain AdaptationImageNet-RTop-1 Error Rate46.08Pyramid Adversarial Training Improves ViT
Domain AdaptationImageNet-ATop-1 accuracy %62.44Pyramid Adversarial Training Improves ViT (Im21k)
Domain AdaptationImageNet-ATop-1 accuracy %36.41Pyramid Adversarial Training Improves ViT (384x384)
Domain AdaptationImageNet-Cmean Corruption Error (mCE)36.8Pyramid Adversarial Training Improves ViT (Im21k)
Domain AdaptationImageNet-Cmean Corruption Error (mCE)41.42Pyramid Adversarial Training Improves ViT
Domain AdaptationImageNet-SketchTop-1 accuracy46.03Pyramid Adversarial Training Improves ViT (Im21k)
Domain AdaptationImageNet-SketchTop-1 accuracy41.04Pyramid Adversarial Training Improves ViT
Image ClassificationObjectNetTop-1 Accuracy49.39ViT-B/16 (512x512) + Pyramid
Image ClassificationObjectNetTop-1 Accuracy47.53ViT-B/16 (512x512) + Pixel
Image ClassificationObjectNetTop-1 Accuracy46.68ViT-B/16 (512x512)
Image ClassificationObjectNetTop-1 Accuracy39.79RegViT on 384x384 + Adv Pyramid
Image ClassificationObjectNetTop-1 Accuracy37.41RegViT on 384x384 + Adv Pixel
Image ClassificationObjectNetTop-1 Accuracy35.59RegViT on 384x384
Image ClassificationObjectNetTop-1 Accuracy34.83RegViT on 384x384 + Random Pyramid
Image ClassificationObjectNetTop-1 Accuracy34.12RegViT on 384x384 + Random Pixel
Image ClassificationObjectNetTop-1 Accuracy32.92RegViT (RandAug) + Adv Pyramid
Image ClassificationObjectNetTop-1 Accuracy30.98Discrete ViT + Pixel
Image ClassificationObjectNetTop-1 Accuracy30.28Discrete ViT + Pyramid
Image ClassificationObjectNetTop-1 Accuracy30.11RegViT (RandAug) + Adv Pixel
Image ClassificationObjectNetTop-1 Accuracy29.95Discrete ViT
Image ClassificationObjectNetTop-1 Accuracy29.41RegViT (RandAug) + Random Pyramid
Image ClassificationObjectNetTop-1 Accuracy29.3RegViT (RandAug)
Image ClassificationObjectNetTop-1 Accuracy28.72RegViT (RandAug) + Random Pixel
Image ClassificationObjectNetTop-1 Accuracy28.6MLP-Mixer + Pyramid
Image ClassificationObjectNetTop-1 Accuracy25.9MLP-Mixer
Image ClassificationObjectNetTop-1 Accuracy25.65ViT + MixUp
Image ClassificationObjectNetTop-1 Accuracy24.75MLP-Mixer + Pixel
Image ClassificationObjectNetTop-1 Accuracy21.61ViT + CutMix
Image ClassificationObjectNetTop-1 Accuracy17.36ViT
Domain GeneralizationImageNet-RTop-1 Error Rate42.16Pyramid Adversarial Training Improves ViT (Im21k)
Domain GeneralizationImageNet-RTop-1 Error Rate46.08Pyramid Adversarial Training Improves ViT
Domain GeneralizationImageNet-ATop-1 accuracy %62.44Pyramid Adversarial Training Improves ViT (Im21k)
Domain GeneralizationImageNet-ATop-1 accuracy %36.41Pyramid Adversarial Training Improves ViT (384x384)
Domain GeneralizationImageNet-Cmean Corruption Error (mCE)36.8Pyramid Adversarial Training Improves ViT (Im21k)
Domain GeneralizationImageNet-Cmean Corruption Error (mCE)41.42Pyramid Adversarial Training Improves ViT
Domain GeneralizationImageNet-SketchTop-1 accuracy46.03Pyramid Adversarial Training Improves ViT (Im21k)
Domain GeneralizationImageNet-SketchTop-1 accuracy41.04Pyramid Adversarial Training Improves ViT

Related Papers

Automatic Classification and Segmentation of Tunnel Cracks Based on Deep Learning and Visual Explanations2025-07-18Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy2025-07-17Federated Learning for Commercial Image Sources2025-07-17MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management2025-07-17Pixel Perfect MegaMed: A Megapixel-Scale Vision-Language Foundation Model for Generating High Resolution Medical Images2025-07-17Simulate, Refocus and Ensemble: An Attention-Refocusing Scheme for Domain Generalization2025-07-17