TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Training data-efficient image transformers & distillation ...

Training data-efficient image transformers & distillation through attention

Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou

2020-12-23Document Layout AnalysisImage ClassificationDocument Image ClassificationFine-Grained Image Classification
PaperPDFCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCode(official)CodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCode

Abstract

Recently, neural networks purely based on attention were shown to address image understanding tasks such as image classification. However, these visual transformers are pre-trained with hundreds of millions of images using an expensive infrastructure, thereby limiting their adoption. In this work, we produce a competitive convolution-free transformer by training on Imagenet only. We train them on a single computer in less than 3 days. Our reference vision transformer (86M parameters) achieves top-1 accuracy of 83.1% (single-crop evaluation) on ImageNet with no external data. More importantly, we introduce a teacher-student strategy specific to transformers. It relies on a distillation token ensuring that the student learns from the teacher through attention. We show the interest of this token-based distillation, especially when using a convnet as a teacher. This leads us to report results competitive with convnets for both Imagenet (where we obtain up to 85.2% accuracy) and when transferring to other tasks. We share our code and models.

Results

TaskDatasetMetricValueModel
Image ClassificationCIFAR-10Percentage correct99.1DeiT-B
Image ClassificationCIFAR-100Percentage correct90.8DeiT-B
Image ClassificationImageNet-1K (with DeiT-S)GFLOPs4.6Base (DeiT-S)
Image ClassificationImageNet-1K (with DeiT-S)Top 1 Accuracy79.8Base (DeiT-S)
Image ClassificationImageNet-1K (with DeiT-T)GFLOPs1.2Base (DeiT-T)
Image ClassificationImageNet-1K (with DeiT-T)Top 1 Accuracy72.2Base (DeiT-T)
Document Layout AnalysisPubLayNet valFigure0.957DeiT-B
Document Layout AnalysisPubLayNet valList0.921DeiT-B
Document Layout AnalysisPubLayNet valOverall0.932DeiT-B
Document Layout AnalysisPubLayNet valTable0.972DeiT-B
Document Layout AnalysisPubLayNet valText0.934DeiT-B
Document Layout AnalysisPubLayNet valTitle0.874DeiT-B

Related Papers

Automatic Classification and Segmentation of Tunnel Cracks Based on Deep Learning and Visual Explanations2025-07-18Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy2025-07-17Federated Learning for Commercial Image Sources2025-07-17MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17Hashed Watermark as a Filter: Defeating Forging and Overwriting Attacks in Weight-based Neural Network Watermarking2025-07-15Transferring Styles for Reduced Texture Bias and Improved Robustness in Semantic Segmentation Networks2025-07-14FedGSCA: Medical Federated Learning with Global Sample Selector and Client Adaptive Adjuster under Label Noise2025-07-13