TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Conditional Prompt Learning for Vision-Language Models

Conditional Prompt Learning for Vision-Language Models

Kaiyang Zhou, Jingkang Yang, Chen Change Loy, Ziwei Liu

2022-03-10CVPR 2022 1Prompt EngineeringDomain Generalization
PaperPDFCodeCodeCodeCodeCode(official)CodeCodeCodeCodeCodeCodeCode

Abstract

With the rise of powerful pre-trained vision-language models like CLIP, it becomes essential to investigate ways to adapt these models to downstream datasets. A recently proposed method named Context Optimization (CoOp) introduces the concept of prompt learning -- a recent trend in NLP -- to the vision domain for adapting pre-trained vision-language models. Specifically, CoOp turns context words in a prompt into a set of learnable vectors and, with only a few labeled images for learning, can achieve huge improvements over intensively-tuned manual prompts. In our study we identify a critical problem of CoOp: the learned context is not generalizable to wider unseen classes within the same dataset, suggesting that CoOp overfits base classes observed during training. To address the problem, we propose Conditional Context Optimization (CoCoOp), which extends CoOp by further learning a lightweight neural network to generate for each image an input-conditional token (vector). Compared to CoOp's static prompts, our dynamic prompts adapt to each instance and are thus less sensitive to class shift. Extensive experiments show that CoCoOp generalizes much better than CoOp to unseen classes, even showing promising transferability beyond a single dataset; and yields stronger domain generalization performance as well. Code is available at https://github.com/KaiyangZhou/CoOp.

Results

TaskDatasetMetricValueModel
Prompt EngineeringImageNet-RTop-1 accuracy %76.18CoCoOP
Prompt EngineeringStanford CarsHarmonic mean72.01CoCoOp
Prompt EngineeringOxford 102 FlowerHarmonic mean81.71CoCoOp
Prompt EngineeringEuroSATHarmonic mean71.21CoCoOp
Prompt EngineeringOxford-IIIT Pet DatasetHarmonic mean96.43CoCoOp
Prompt EngineeringImageNet-STop-1 accuracy %48.75CoCoOp
Prompt EngineeringDTDHarmonic mean64.85CoCoOp
Prompt EngineeringUCF101Harmonic mean77.64CoCoOp
Prompt EngineeringFood-101Harmonic mean90.99CoCoOp
Prompt EngineeringCaltech-101Harmonic mean95.84CoCoOp
Prompt EngineeringImageNetHarmonic mean73.1CoCoOp
Prompt EngineeringFGVC-AircraftHarmonic mean27.74CoCoOp
Prompt EngineeringSUN397Harmonic mean78.27CoCoOp
Prompt EngineeringImageNet-ATop-1 accuracy %50.63CoCoOp
Prompt EngineeringImageNet V2Top-1 accuracy %64.07CoCoOp

Related Papers

Leveraging Language Prior for Infrared Small Target Detection2025-07-17Emotional Support with LLM-based Empathetic Dialogue Generation2025-07-17Simulate, Refocus and Ensemble: An Attention-Refocusing Scheme for Domain Generalization2025-07-17GLAD: Generalizable Tuning for Vision-Language Models2025-07-17MoTM: Towards a Foundation Model for Time Series Imputation based on Continuous Modeling2025-07-17InstructFLIP: Exploring Unified Vision-Language Model for Face Anti-spoofing2025-07-16Prompt Engineering in Segment Anything Model: Methodologies, Applications, and Emerging Challenges2025-07-13From Physics to Foundation Models: A Review of AI-Driven Quantitative Remote Sensing Inversion2025-07-11