TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Query2Label: A Simple Transformer Way to Multi-Label Class...

Query2Label: A Simple Transformer Way to Multi-Label Classification

Shilong Liu, Lei Zhang, Xiao Yang, Hang Su, Jun Zhu

2021-07-22ClassificationMulti-Label Classification
PaperPDFCodeCode(official)Code

Abstract

This paper presents a simple and effective approach to solving the multi-label classification problem. The proposed approach leverages Transformer decoders to query the existence of a class label. The use of Transformer is rooted in the need of extracting local discriminative features adaptively for different labels, which is a strongly desired property due to the existence of multiple objects in one image. The built-in cross-attention module in the Transformer decoder offers an effective way to use label embeddings as queries to probe and pool class-related features from a feature map computed by a vision backbone for subsequent binary classifications. Compared with prior works, the new framework is simple, using standard Transformers and vision backbones, and effective, consistently outperforming all previous works on five multi-label classification data sets, including MS-COCO, PASCAL VOC, NUS-WIDE, and Visual Genome. Particularly, we establish $91.3\%$ mAP on MS-COCO. We hope its compact structure, simple implementation, and superior performance serve as a strong baseline for multi-label classification tasks and future studies. The code will be available soon at https://github.com/SlongLiu/query2labels.

Results

TaskDatasetMetricValueModel
Multi-Label ClassificationPASCAL VOC 2012mAP96.2Q2L-TResL(448 resolution)
Multi-Label ClassificationMS-COCOmAP91.3Q2L-CvT(ImageNet-21K pretraining, resolution 384)
Multi-Label ClassificationMS-COCOmAP90.5Q2L-SwinL(ImageNet-21K pretraining, resolution 384)
Multi-Label ClassificationMS-COCOmAP90.3Q2L-TResL(ImageNet-21K pretraining, resolution 640)
Multi-Label ClassificationMS-COCOmAP84.9Q2L-R101(resolution 448)
Multi-Label ClassificationNUS-WIDEMAP70.1Q2L-CvT(resolution 384, ImageNet-21K pretrained)
Multi-Label ClassificationNUS-WIDEMAP66.3Q2L-TResL(resoluition 448)
Multi-Label ClassificationNUS-WIDEMAP65Q2L-R101(resolution 448)
Multi-Label ClassificationPASCAL VOC 2007mAP97.3Q2L-CvT(ImageNet-21K pretrained, resolution 384)
Multi-Label ClassificationPASCAL VOC 2007mAP96.9Q2L-TResL(ImageNet-21K pretrained, resolution 448)
Multi-Label ClassificationPASCAL VOC 2007mAP96.1Q2L-TResL(resolution 448)

Related Papers

Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16Safeguarding Federated Learning-based Road Condition Classification2025-07-16AI-Enhanced Pediatric Pneumonia Detection: A CNN-Based Approach Using Data Augmentation and Generative Adversarial Networks (GANs)2025-07-13Fuzzy Classification Aggregation for a Continuum of Agents2025-07-06Hybrid-View Attention for csPCa Classification in TRUS2025-07-04Devising a solution to the problems of Cancer awareness in Telangana2025-06-26A Semi-supervised Scalable Unified Framework for E-commerce Query Classification2025-06-26