TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Big Transfer (BiT): General Visual Representation Learning

Big Transfer (BiT): General Visual Representation Learning

Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Joan Puigcerver, Jessica Yung, Sylvain Gelly, Neil Houlsby

2019-12-24ECCV 2020 8Few-Shot LearningImage ClassificationRepresentation LearningOut-of-Distribution GeneralizationFine-Grained Image Classification
PaperPDFCodeCodeCodeCodeCodeCodeCode(official)CodeCode

Abstract

Transfer of pre-trained representations improves sample efficiency and simplifies hyperparameter tuning when training deep neural networks for vision. We revisit the paradigm of pre-training on large supervised datasets and fine-tuning the model on a target task. We scale up pre-training, and propose a simple recipe that we call Big Transfer (BiT). By combining a few carefully selected components, and transferring using a simple heuristic, we achieve strong performance on over 20 datasets. BiT performs well across a surprisingly wide range of data regimes -- from 1 example per class to 1M total examples. BiT achieves 87.5% top-1 accuracy on ILSVRC-2012, 99.4% on CIFAR-10, and 76.3% on the 19 task Visual Task Adaptation Benchmark (VTAB). On small datasets, BiT attains 76.8% on ILSVRC-2012 with 10 examples per class, and 97.0% on CIFAR-10 with 10 examples per class. We conduct detailed analysis of the main components that lead to high transfer performance.

Results

TaskDatasetMetricValueModel
Image ClassificationOmniBenchmarkAverage Top-1 Accuracy40.4BiT-M
Image ClassificationObjectNetTop-1 Accuracy58.7BiT-L (ResNet-152x4)
Image ClassificationObjectNetTop-5 Accuracy80BiT-L (ResNet-152x4)
Image ClassificationObjectNetTop-1 Accuracy47BiT-M (ResNet-152x4)
Image ClassificationObjectNetTop-5 Accuracy69BiT-M (ResNet-152x4)
Image ClassificationObjectNetTop-1 Accuracy36BiT-S (ResNet-152x4)
Image ClassificationObjectNetTop-5 Accuracy57BiT-S (ResNet-152x4)
Image ClassificationCIFAR-10Percentage correct99.37BiT-L (ResNet)
Image ClassificationCIFAR-10Percentage correct98.91BiT-M (ResNet)
Image ClassificationVTAB-1kTop-1 Accuracy78.72BiT-L (50 hypers/task)
Image ClassificationVTAB-1kTop-1 Accuracy76.3BiT-L
Image ClassificationVTAB-1kTop-1 Accuracy70.6BiT-M
Image ClassificationVTAB-1kTop-1 Accuracy66.9BiT-S
Image ClassificationFlowers-102Accuracy99.63BiT-L (ResNet)
Image ClassificationFlowers-102Accuracy99.3BiT-M (ResNet)
Image ClassificationObjectNet (Bounding Box)Top 5 Accuracy85.1BiT-L (ResNet)
Image ClassificationObjectNet (Bounding Box)Top 5 Accuracy76BiT-M (ResNet)
Image ClassificationObjectNet (Bounding Box)Top 5 Accuracy64.4BiT-S (ResNet)
Image ClassificationCIFAR-100Percentage correct93.51BiT-L (ResNet)
Image ClassificationCIFAR-100Percentage correct92.17BiT-M (ResNet)
Image ClassificationImageNetTop 5 Accuracy98.46BiT-L (ResNet)
Image ClassificationOxford-IIIT PetsAccuracy96.62BiT-L (ResNet)
Image ClassificationOxford-IIIT PetsAccuracy94.47BiT-M (ResNet)
Image ClassificationOxford 102 FlowersTop-1 Error Rate0.37BiT-L (ResNet)
Image ClassificationOxford 102 FlowersTop-1 Error Rate0.7BiT-M (ResNet)
Fine-Grained Image ClassificationOxford-IIIT PetsAccuracy96.62BiT-L (ResNet)
Fine-Grained Image ClassificationOxford-IIIT PetsAccuracy94.47BiT-M (ResNet)
Fine-Grained Image ClassificationOxford 102 FlowersTop-1 Error Rate0.37BiT-L (ResNet)
Fine-Grained Image ClassificationOxford 102 FlowersTop-1 Error Rate0.7BiT-M (ResNet)

Related Papers

Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper2025-07-20Automatic Classification and Segmentation of Tunnel Cracks Based on Deep Learning and Visual Explanations2025-07-18GLAD: Generalizable Tuning for Vision-Language Models2025-07-17Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy2025-07-17Federated Learning for Commercial Image Sources2025-07-17MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17