TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Improving Image Recognition by Retrieving from Web-Scale I...

Improving Image Recognition by Retrieving from Web-Scale Image-Text Data

Ahmet Iscen, Alireza Fathi, Cordelia Schmid

2023-04-11CVPR 2023 1Image ClassificationLong-tail LearningLearning with noisy labels
PaperPDF

Abstract

Retrieval augmented models are becoming increasingly popular for computer vision tasks after their recent success in NLP problems. The goal is to enhance the recognition capabilities of the model by retrieving similar examples for the visual input from an external memory set. In this work, we introduce an attention-based memory module, which learns the importance of each retrieved example from the memory. Compared to existing approaches, our method removes the influence of the irrelevant retrieved examples, and retains those that are beneficial to the input query. We also thoroughly study various ways of constructing the memory dataset. Our experiments show the benefit of using a massive-scale memory dataset of 1B image-text pairs, and demonstrate the performance of different memory representations. We evaluate our method in three different classification tasks, namely long-tailed recognition, learning with noisy labels, and fine-grained classification, and show that it achieves state-of-the-art accuracies in ImageNet-LT, Places-LT and Webvision datasets.

Results

TaskDatasetMetricValueModel
Image ClassificationWebVision-1000Top-1 Accuracy83.6MAM (ViT-B/16)
Image ClassificationPlaces-LTTop-1 Accuracy51.4MAM (ViT-B/16)
Image ClassificationImageNet-LTTop-1 Accuracy82.3MAM (ViT-B/16)
Few-Shot Image ClassificationPlaces-LTTop-1 Accuracy51.4MAM (ViT-B/16)
Few-Shot Image ClassificationImageNet-LTTop-1 Accuracy82.3MAM (ViT-B/16)
Generalized Few-Shot ClassificationPlaces-LTTop-1 Accuracy51.4MAM (ViT-B/16)
Generalized Few-Shot ClassificationImageNet-LTTop-1 Accuracy82.3MAM (ViT-B/16)
Long-tail LearningPlaces-LTTop-1 Accuracy51.4MAM (ViT-B/16)
Long-tail LearningImageNet-LTTop-1 Accuracy82.3MAM (ViT-B/16)
Generalized Few-Shot LearningPlaces-LTTop-1 Accuracy51.4MAM (ViT-B/16)
Generalized Few-Shot LearningImageNet-LTTop-1 Accuracy82.3MAM (ViT-B/16)

Related Papers

Automatic Classification and Segmentation of Tunnel Cracks Based on Deep Learning and Visual Explanations2025-07-18Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy2025-07-17Federated Learning for Commercial Image Sources2025-07-17MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17CLID-MU: Cross-Layer Information Divergence Based Meta Update Strategy for Learning with Noisy Labels2025-07-16Hashed Watermark as a Filter: Defeating Forging and Overwriting Attacks in Weight-based Neural Network Watermarking2025-07-15Transferring Styles for Reduced Texture Bias and Improved Robustness in Semantic Segmentation Networks2025-07-14