TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/PromptKD: Unsupervised Prompt Distillation for Vision-Lang...

PromptKD: Unsupervised Prompt Distillation for Vision-Language Models

Zheng Li, Xiang Li, Xinyi Fu, Xin Zhang, Weiqiang Wang, Shuo Chen, Jian Yang

2024-03-05CVPR 2024 1Zero-Shot Image ClassificationPrompt EngineeringKnowledge Distillation
PaperPDFCode(official)

Abstract

Prompt learning has emerged as a valuable technique in enhancing vision-language models (VLMs) such as CLIP for downstream tasks in specific domains. Existing work mainly focuses on designing various learning forms of prompts, neglecting the potential of prompts as effective distillers for learning from larger teacher models. In this paper, we introduce an unsupervised domain prompt distillation framework, which aims to transfer the knowledge of a larger teacher model to a lightweight target model through prompt-driven imitation using unlabeled domain images. Specifically, our framework consists of two distinct stages. In the initial stage, we pre-train a large CLIP teacher model using domain (few-shot) labels. After pre-training, we leverage the unique decoupled-modality characteristics of CLIP by pre-computing and storing the text features as class vectors only once through the teacher text encoder. In the subsequent stage, the stored class vectors are shared across teacher and student image encoders for calculating the predicted logits. Further, we align the logits of both the teacher and student models via KL divergence, encouraging the student image encoder to generate similar probability distributions to the teacher through the learnable prompts. The proposed prompt distillation process eliminates the reliance on labeled data, enabling the algorithm to leverage a vast amount of unlabeled images within the domain. Finally, the well-trained student image encoders and pre-stored text features (class vectors) are utilized for inference. To our best knowledge, we are the first to (1) perform unsupervised domain-specific prompt-driven knowledge distillation for CLIP, and (2) establish a practical pre-storing mechanism of text features as shared class vectors between teacher and student. Extensive experiments on 11 datasets demonstrate the effectiveness of our method.

Results

TaskDatasetMetricValueModel
Prompt EngineeringStanford CarsHarmonic mean83.13PromptKD
Prompt EngineeringOxford 102 FlowerHarmonic mean90.24PromptKD
Prompt EngineeringEuroSATHarmonic mean89.14PromptKD
Prompt EngineeringOxford-IIIT Pet DatasetHarmonic mean97.15PromptKD
Prompt EngineeringDTDHarmonic mean77.94PromptKD
Prompt EngineeringUCF101Harmonic mean86.1PromptKD
Prompt EngineeringFood-101Harmonic mean93.05PromptKD
Prompt EngineeringCaltech-101Harmonic mean97.77PromptKD
Prompt EngineeringImageNetHarmonic mean77.62PromptKD
Prompt EngineeringFGVC-AircraftHarmonic mean45.17PromptKD
Prompt EngineeringSUN397Harmonic mean82.6PromptKD

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21Leveraging Language Prior for Infrared Small Target Detection2025-07-17Emotional Support with LLM-based Empathetic Dialogue Generation2025-07-17Uncertainty-Aware Cross-Modal Knowledge Distillation with Prototype Learning for Multimodal Brain-Computer Interfaces2025-07-17DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16HanjaBridge: Resolving Semantic Ambiguity in Korean LLMs via Hanja-Augmented Pre-Training2025-07-15Feature Distillation is the Better Choice for Model-Heterogeneous Federated Learning2025-07-14Prompt Engineering in Segment Anything Model: Methodologies, Applications, and Emerging Challenges2025-07-13