TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/LeViT: a Vision Transformer in ConvNet's Clothing for Fast...

LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference

Ben Graham, Alaaeldin El-Nouby, Hugo Touvron, Pierre Stock, Armand Joulin, Hervé Jégou, Matthijs Douze

2021-04-02ICCV 2021 10Image ClassificationGeneral Classification
PaperPDFCodeCodeCodeCode(official)Code(official)CodeCodeCodeCodeCodeCodeCode

Abstract

We design a family of image classification architectures that optimize the trade-off between accuracy and efficiency in a high-speed regime. Our work exploits recent findings in attention-based architectures, which are competitive on highly parallel processing hardware. We revisit principles from the extensive literature on convolutional neural networks to apply them to transformers, in particular activation maps with decreasing resolutions. We also introduce the attention bias, a new way to integrate positional information in vision transformers. As a result, we propose LeVIT: a hybrid neural network for fast inference image classification. We consider different measures of efficiency on different hardware platforms, so as to best reflect a wide range of application scenarios. Our extensive experiments empirically validate our technical choices and show they are suitable to most architectures. Overall, LeViT significantly outperforms existing convnets and vision transformers with respect to the speed/accuracy tradeoff. For example, at 80% ImageNet top-1 accuracy, LeViT is 5 times faster than EfficientNet on CPU. We release the code at https://github.com/facebookresearch/LeViT

Results

TaskDatasetMetricValueModel
Image ClassificationStanford CarsAccuracy89.8LeViT-192
Image ClassificationStanford CarsAccuracy89.3LeViT-384
Image ClassificationStanford CarsAccuracy88.6LeViT-128
Image ClassificationStanford CarsAccuracy88.4LeViT-128S
Image ClassificationStanford CarsAccuracy88.2LeViT-256
Image ClassificationImageNet V2Top 1 Accuracy71.4LeViT-384
Image ClassificationImageNet V2Top 1 Accuracy69.9LeViT-256
Image ClassificationImageNet V2Top 1 Accuracy68.7LeViT-192
Image ClassificationImageNet V2Top 1 Accuracy67.5LeViT-128
Image ClassificationImageNet V2Top 1 Accuracy63.9LeViT-128S
Image ClassificationCIFAR-10Percentage correct98.2LeViT-192
Image ClassificationCIFAR-10Percentage correct98.1LeViT-256
Image ClassificationCIFAR-10Percentage correct98LeViT-384
Image ClassificationCIFAR-10Percentage correct97.6LeViT-128
Image ClassificationCIFAR-10Percentage correct97.5LeViT-128S
Image ClassificationFlowers-102Accuracy98.3LeViT-384
Image ClassificationFlowers-102Accuracy97.8LeViT-192
Image ClassificationFlowers-102Accuracy97.7LeViT-256
Image ClassificationFlowers-102Accuracy96.8LeViT-128S
Image ClassificationiNaturalist 2019Top-1 Accuracy74.3LeViT-384
Image ClassificationiNaturalist 2019Top-1 Accuracy72.3LeViT-256
Image ClassificationiNaturalist 2019Top-1 Accuracy70.8LeViT-192
Image ClassificationiNaturalist 2019Top-1 Accuracy68.4LeViT-128
Image ClassificationiNaturalist 2019Top-1 Accuracy66.5LeViT-128S
Image ClassificationImageNetGFLOPs2.334LeViT-384
Image ClassificationImageNetGFLOPs1.066LeViT-256
Image ClassificationImageNetGFLOPs0.624LeViT-192
Image ClassificationImageNetGFLOPs0.376LeViT-128
Image ClassificationImageNetGFLOPs0.288LeViT-128S

Related Papers

Automatic Classification and Segmentation of Tunnel Cracks Based on Deep Learning and Visual Explanations2025-07-18Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy2025-07-17Federated Learning for Commercial Image Sources2025-07-17MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17Hashed Watermark as a Filter: Defeating Forging and Overwriting Attacks in Weight-based Neural Network Watermarking2025-07-15Transferring Styles for Reduced Texture Bias and Improved Robustness in Semantic Segmentation Networks2025-07-14FedGSCA: Medical Federated Learning with Global Sample Selector and Client Adaptive Adjuster under Label Noise2025-07-13