TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/VL-LTR: Learning Class-wise Visual-Linguistic Representati...

VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition

Changyao Tian, Wenhai Wang, Xizhou Zhu, Jifeng Dai, Yu Qiao

2021-11-26Image ClassificationLong-tail LearningTransfer Learning
PaperPDFCode(official)

Abstract

Deep learning-based models encounter challenges when processing long-tailed data in the real world. Existing solutions usually employ some balancing strategies or transfer learning to deal with the class imbalance problem, based on the image modality. In this work, we present a visual-linguistic long-tailed recognition framework, termed VL-LTR, and conduct empirical studies on the benefits of introducing text modality for long-tailed recognition (LTR). Compared to existing approaches, the proposed VL-LTR has the following merits. (1) Our method can not only learn visual representation from images but also learn corresponding linguistic representation from noisy class-level text descriptions collected from the Internet; (2) Our method can effectively use the learned visual-linguistic representation to improve the visual recognition performance, especially for classes with fewer image samples. We also conduct extensive experiments and set the new state-of-the-art performance on widely-used LTR benchmarks. Notably, our method achieves 77.2% overall accuracy on ImageNet-LT, which significantly outperforms the previous best method by over 17 points, and is close to the prevailing performance training on the full ImageNet. Code is available at https://github.com/ChangyaoTian/VL-LTR.

Results

TaskDatasetMetricValueModel
Image ClassificationPlaces-LTTop-1 Accuracy50.1VL-LTR (ViT-B-16)
Image ClassificationPlaces-LTTop-1 Accuracy48VL-LTR (ResNet-50)
Image ClassificationImageNet-LTTop-1 Accuracy77.2VL-LTR (ViT-B-16)
Image ClassificationImageNet-LTTop-1 Accuracy70.1VL-LTR (ResNet-50)
Few-Shot Image ClassificationPlaces-LTTop-1 Accuracy50.1VL-LTR (ViT-B-16)
Few-Shot Image ClassificationPlaces-LTTop-1 Accuracy48VL-LTR (ResNet-50)
Few-Shot Image ClassificationImageNet-LTTop-1 Accuracy77.2VL-LTR (ViT-B-16)
Few-Shot Image ClassificationImageNet-LTTop-1 Accuracy70.1VL-LTR (ResNet-50)
Generalized Few-Shot ClassificationPlaces-LTTop-1 Accuracy50.1VL-LTR (ViT-B-16)
Generalized Few-Shot ClassificationPlaces-LTTop-1 Accuracy48VL-LTR (ResNet-50)
Generalized Few-Shot ClassificationImageNet-LTTop-1 Accuracy77.2VL-LTR (ViT-B-16)
Generalized Few-Shot ClassificationImageNet-LTTop-1 Accuracy70.1VL-LTR (ResNet-50)
Long-tail LearningPlaces-LTTop-1 Accuracy50.1VL-LTR (ViT-B-16)
Long-tail LearningPlaces-LTTop-1 Accuracy48VL-LTR (ResNet-50)
Long-tail LearningImageNet-LTTop-1 Accuracy77.2VL-LTR (ViT-B-16)
Long-tail LearningImageNet-LTTop-1 Accuracy70.1VL-LTR (ResNet-50)
Generalized Few-Shot LearningPlaces-LTTop-1 Accuracy50.1VL-LTR (ViT-B-16)
Generalized Few-Shot LearningPlaces-LTTop-1 Accuracy48VL-LTR (ResNet-50)
Generalized Few-Shot LearningImageNet-LTTop-1 Accuracy77.2VL-LTR (ViT-B-16)
Generalized Few-Shot LearningImageNet-LTTop-1 Accuracy70.1VL-LTR (ResNet-50)

Related Papers

Automatic Classification and Segmentation of Tunnel Cracks Based on Deep Learning and Visual Explanations2025-07-18RaMen: Multi-Strategy Multi-Modal Learning for Bundle Construction2025-07-18Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy2025-07-17Federated Learning for Commercial Image Sources2025-07-17MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17Disentangling coincident cell events using deep transfer learning and compressive sensing2025-07-17Best Practices for Large-Scale, Pixel-Wise Crop Mapping and Transfer Learning Workflows2025-07-16