TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Are Transformers More Robust Than CNNs?

Are Transformers More Robust Than CNNs?

Yutong Bai, Jieru Mei, Alan Yuille, Cihang Xie

2021-11-10NeurIPS 2021 12Adversarial Robustness
PaperPDFCode(official)

Abstract

Transformer emerges as a powerful tool for visual recognition. In addition to demonstrating competitive performance on a broad range of visual benchmarks, recent works also argue that Transformers are much more robust than Convolutions Neural Networks (CNNs). Nonetheless, surprisingly, we find these conclusions are drawn from unfair experimental settings, where Transformers and CNNs are compared at different scales and are applied with distinct training frameworks. In this paper, we aim to provide the first fair & in-depth comparisons between Transformers and CNNs, focusing on robustness evaluations. With our unified training setup, we first challenge the previous belief that Transformers outshine CNNs when measuring adversarial robustness. More surprisingly, we find CNNs can easily be as robust as Transformers on defending against adversarial attacks, if they properly adopt Transformers' training recipes. While regarding generalization on out-of-distribution samples, we show pre-training on (external) large-scale datasets is not a fundamental request for enabling Transformers to achieve better performance than CNNs. Moreover, our ablations suggest such stronger generalization is largely benefited by the Transformer's self-attention-like architectures per se, rather than by other training setups. We hope this work can help the community better understand and benchmark the robustness of Transformers and CNNs. The code and models are publicly available at https://github.com/ytongbai/ViTs-vs-CNNs.

Results

TaskDatasetMetricValueModel
Adversarial RobustnessImageNetAccuracy77.4ResNet-50 (SGD, Cosine)
Adversarial RobustnessImageNetAccuracy76.9ResNet-50 (SGD, Step)
Adversarial RobustnessImageNetAccuracy76.8DeiT-S (AdamW, Cosine)
Adversarial RobustnessImageNetAccuracy76.4ResNet-50 (AdamW, Cosine)
Adversarial RobustnessStylized ImageNetAccuracy13DeiT-S (AdamW, Cosine)
Adversarial RobustnessStylized ImageNetAccuracy8.4ResNet-50 (SGD, Cosine)
Adversarial RobustnessStylized ImageNetAccuracy8.3ResNet-50 (SGD, Step)
Adversarial RobustnessStylized ImageNetAccuracy8.1ResNet-50 (AdamW, Cosine)
Adversarial RobustnessImageNet-Cmean Corruption Error (mCE)48DeiT-S (AdamW, Cosine)
Adversarial RobustnessImageNet-Cmean Corruption Error (mCE)56.9ResNet-50 (SGD, Cosine)
Adversarial RobustnessImageNet-Cmean Corruption Error (mCE)57.9ResNet-50 (SGD, Step)
Adversarial RobustnessImageNet-Cmean Corruption Error (mCE)59.3ResNet-50 (AdamW, Cosine)
Adversarial RobustnessImageNet-AAccuracy12.2DeiT-S (AdamW, Cosine)
Adversarial RobustnessImageNet-AAccuracy3.3ResNet-50 (SGD, Cosine)
Adversarial RobustnessImageNet-AAccuracy3.2ResNet-50 (SGD, Step)
Adversarial RobustnessImageNet-AAccuracy3.1ResNet-50 (AdamW, Cosine)

Related Papers

Bridging Robustness and Generalization Against Word Substitution Attacks in NLP via the Growth Bound Matrix Approach2025-07-14Tail-aware Adversarial Attacks: A Distributional Approach to Efficient LLM Jailbreaking2025-07-06Evaluating the Evaluators: Trust in Adversarial Robustness Tests2025-07-04Rectifying Adversarial Sample with Low Entropy Prior for Test-Time Defense2025-07-04Is Reasoning All You Need? Probing Bias in the Age of Reasoning Language Models2025-07-03NIC-RobustBench: A Comprehensive Open-Source Toolkit for Neural Image Compression and Robustness Analysis2025-06-23PRISON: Unmasking the Criminal Potential of Large Language Models2025-06-19Intriguing Frequency Interpretation of Adversarial Robustness for CNNs and ViTs2025-06-15