TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Learning Domain Invariant Prompt for Vision-Language Models

Learning Domain Invariant Prompt for Vision-Language Models

Cairong Zhao, Yubin Wang, Xinyang Jiang, Yifei Shen, Kaitao Song, Dongsheng Li, Duoqian Miao

2022-12-08Meta-LearningPrompt EngineeringDomain GeneralizationLanguage Modelling
PaperPDFCode

Abstract

Prompt learning is one of the most effective and trending ways to adapt powerful vision-language foundation models like CLIP to downstream datasets by tuning learnable prompt vectors with very few samples. However, although prompt learning achieves excellent performance over in-domain data, it still faces the major challenge of generalizing to unseen classes and domains. Some existing prompt learning methods tackle this issue by adaptively generating different prompts for different tokens or domains but neglecting the ability of learned prompts to generalize to unseen domains. In this paper, we propose a novel prompt learning paradigm that directly generates \emph{domain invariant} prompt that can be generalized to unseen domains, called MetaPrompt. Specifically, a dual-modality prompt tuning network is proposed to generate prompts for input from both image and text modalities. With a novel asymmetric contrastive loss, the representation from the original pre-trained vision-language model acts as supervision to enhance the generalization ability of the learned prompt. More importantly, we propose a meta-learning-based prompt tuning algorithm that explicitly constrains the task-specific prompt tuned for one domain or class to also achieve good performance in another domain or class. Extensive experiments on 11 datasets for base-to-new generalization and 4 datasets for domain generalization demonstrate that our method consistently and significantly outperforms existing methods.

Results

TaskDatasetMetricValueModel
Prompt EngineeringStanford CarsHarmonic mean75.48MetaPrompt
Prompt EngineeringOxford 102 FlowerHarmonic mean84.52MetaPrompt
Prompt EngineeringEuroSATHarmonic mean83.38MetaPrompt
Prompt EngineeringOxford-IIIT Pet DatasetHarmonic mean96.26MetaPrompt
Prompt EngineeringDTDHarmonic mean68.35MetaPrompt
Prompt EngineeringUCF101Harmonic mean81.35MetaPrompt
Prompt EngineeringFood-101Harmonic mean91.29MetaPrompt
Prompt EngineeringCaltech-101Harmonic mean96.32MetaPrompt
Prompt EngineeringImageNetHarmonic mean74.02MetaPrompt
Prompt EngineeringFGVC-AircraftHarmonic mean38.24MetaPrompt
Prompt EngineeringSUN397Harmonic mean80.62MetaPrompt

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21Leveraging Language Prior for Infrared Small Target Detection2025-07-17Emotional Support with LLM-based Empathetic Dialogue Generation2025-07-17Simulate, Refocus and Ensemble: An Attention-Refocusing Scheme for Domain Generalization2025-07-17GLAD: Generalizable Tuning for Vision-Language Models2025-07-17MoTM: Towards a Foundation Model for Time Series Imputation based on Continuous Modeling2025-07-17Making Language Model a Hierarchical Classifier and Generator2025-07-17VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17