Fixing the train-test resolution discrepancy

Hugo Touvron, Andrea Vedaldi, Matthijs Douze, Hervé Jégou

2019-06-14NeurIPS 2019 12Image Classification Data Augmentation General Classification Fine-Grained Image Classification

Paper PDF Code(official)Code Code

Abstract

Data-augmentation is key to the training of neural networks for image classification. This paper first shows that existing augmentations induce a significant discrepancy between the typical size of the objects seen by the classifier at train and test time. We experimentally validate that, for a target test resolution, using a lower train resolution offers better classification at test time. We then propose a simple yet effective and efficient strategy to optimize the classifier performance when the train and test resolutions differ. It involves only a computationally cheap fine-tuning of the network at the test resolution. This enables training strong classifiers using small training images. For instance, we obtain 77.1% top-1 accuracy on ImageNet with a ResNet-50 trained on 128x128 images, and 79.8% with one trained on 224x224 image. In addition, if we use extra training data we get 82.5% with the ResNet-50 train with 224x224 images. Conversely, when training a ResNeXt-101 32x48d pre-trained in weakly-supervised fashion on 940 million public images at resolution 224x224 and further optimizing for test resolution 320x320, we obtain a test top-1 accuracy of 86.4% (top-5: 98.0%) (single-crop). To the best of our knowledge this is the highest ImageNet single-crop, top-1 and top-5 accuracy to date.

Results

Task	Dataset	Metric	Value	Model
Image Classification	iNaturalist	Top 1 Accuracy	75.4	FixSENet-154
Image Classification	CUB-200-2011	Accuracy	88.7	FixSENet-154
Fine-Grained Image Classification	CUB-200-2011	Accuracy	88.7	FixSENet-154

Related Papers

Automatic Classification and Segmentation of Tunnel Cracks Based on Deep Learning and Visual Explanations2025-07-18 Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17 Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy2025-07-17 Federated Learning for Commercial Image Sources2025-07-17 MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17 Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management2025-07-17 Pixel Perfect MegaMed: A Megapixel-Scale Vision-Language Foundation Model for Generating High Resolution Medical Images2025-07-17 Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16