TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Texts as Images in Prompt Tuning for Multi-Label Image Rec...

Texts as Images in Prompt Tuning for Multi-Label Image Recognition

Zixian Guo, Bowen Dong, Zhilong Ji, Jinfeng Bai, Yiwen Guo, WangMeng Zuo

2022-11-23CVPR 2023 1Multi-label Image Recognition with Partial LabelsContrastive Learning
PaperPDFCode(official)

Abstract

Prompt tuning has been employed as an efficient way to adapt large vision-language pre-trained models (e.g. CLIP) to various downstream tasks in data-limited or label-limited settings. Nonetheless, visual data (e.g., images) is by default prerequisite for learning prompts in existing methods. In this work, we advocate that the effectiveness of image-text contrastive learning in aligning the two modalities (for training CLIP) further makes it feasible to treat texts as images for prompt tuning and introduce TaI prompting. In contrast to the visual data, text descriptions are easy to collect, and their class labels can be directly derived. Particularly, we apply TaI prompting to multi-label image recognition, where sentences in the wild serve as alternatives to images for prompt tuning. Moreover, with TaI, double-grained prompt tuning (TaI-DPT) is further presented to extract both coarse-grained and fine-grained embeddings for enhancing the multi-label recognition performance. Experimental results show that our proposed TaI-DPT outperforms zero-shot CLIP by a large margin on multiple benchmarks, e.g., MS-COCO, VOC2007, and NUS-WIDE, while it can be combined with existing methods of prompting from images to improve recognition performance further. Code is released at https://github.com/guozix/TaI-DPT.

Results

TaskDatasetMetricValueModel
Multi-Label Image ClassificationMS-COCO-2014Average mAP83.6DualCoOp+TaI-DPT
Multi-Label Image ClassificationPASCAL VOC 2007Average mAP94.8DualCoOp+TaI-DPT
Image ClassificationMS-COCO-2014Average mAP83.6DualCoOp+TaI-DPT
Image ClassificationPASCAL VOC 2007Average mAP94.8DualCoOp+TaI-DPT
2D ClassificationMS-COCO-2014Average mAP83.6DualCoOp+TaI-DPT
2D ClassificationPASCAL VOC 2007Average mAP94.8DualCoOp+TaI-DPT

Related Papers

SemCSE: Semantic Contrastive Sentence Embeddings Using LLM-Generated Summaries For Scientific Abstracts2025-07-17HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals2025-07-17Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management2025-07-17SGCL: Unifying Self-Supervised and Supervised Learning for Graph Recommendation2025-07-17Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16LLM-Driven Dual-Level Multi-Interest Modeling for Recommendation2025-07-15Latent Space Consistency for Sparse-View CT Reconstruction2025-07-15Self-supervised pretraining of vision transformers for animal behavioral analysis and neural encoding2025-07-13