TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Revisiting the Power of Prompt for Visual Tuning

Revisiting the Power of Prompt for Visual Tuning

Yuzhu Wang, Lechao Cheng, Chaowei Fang, Dingwen Zhang, Manni Duan, Meng Wang

2024-02-04Visual Prompt Tuning
PaperPDF

Abstract

Visual prompt tuning (VPT) is a promising solution incorporating learnable prompt tokens to customize pre-trained models for downstream tasks. However, VPT and its variants often encounter challenges like prompt initialization, prompt length, and subpar performance in self-supervised pretraining, hindering successful contextual adaptation. This study commences by exploring the correlation evolvement between prompts and patch tokens during proficient training. Inspired by the observation that the prompt tokens tend to share high mutual information with patch tokens, we propose initializing prompts with downstream token prototypes. The strategic initialization, a stand-in for the previous initialization, substantially improves performance in fine-tuning. To refine further, we optimize token construction with a streamlined pipeline that maintains excellent performance with almost no increase in computational expenses compared to VPT. Exhaustive experiments show our proposed approach outperforms existing methods by a remarkable margin. For instance, it surpasses full fine-tuning in 19 out of 24 tasks, using less than 0.4% of learnable parameters on the FGVC and VTAB-1K benchmarks. Notably, our method significantly advances the adaptation for self-supervised pretraining, achieving impressive task performance gains of at least 10% to 30%. Besides, the experimental results demonstrate the proposed SPT is robust to prompt lengths and scales well with model capacity and training data size. We finally provide an insightful exploration into the amount of target data facilitating the adaptation of pre-trained models to downstream tasks. The code is available at https://github.com/WangYZ1608/Self-Prompt-Tuning.

Results

TaskDatasetMetricValueModel
Visual Prompt TuningFGVCMean Accuracy86SPT-Deep(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)
Visual Prompt TuningFGVCMean Accuracy84.08SPT-Shallow(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)
Visual Prompt TuningFGVCMean Accuracy83.26SPT-Deep(ViT-B/16_MAE_pretrained_ImageNet-1K)
Visual Prompt TuningFGVCMean Accuracy73.95SPT-Shallow(ViT-B/16_MAE_pretrained_ImageNet-1K)
Visual Prompt TuningVTAB-1k(Structured<8>)Mean Accuracy59.23SPT-Deep(ViT-B/16_MAE_pretrained_ImageNet-1K)
Visual Prompt TuningVTAB-1k(Structured<8>)Mean Accuracy58.36SPT-Deep(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)
Visual Prompt TuningVTAB-1k(Structured<8>)Mean Accuracy55.16SPT-Shallow(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)
Visual Prompt TuningVTAB-1k(Structured<8>)Mean Accuracy53.46SPT-Shallow(ViT-B/16_MAE_pretrained_ImageNet-1K)
Visual Prompt TuningVTAB-1k(Natural<7>)Mean Accuracy76.2SPT-Deep(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)
Visual Prompt TuningVTAB-1k(Natural<7>)Mean Accuracy74.47SPT-Shallow(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)
Visual Prompt TuningVTAB-1k(Natural<7>)Mean Accuracy67.19SPT-Deep(ViT-B/16_MAE_pretrained_ImageNet-1K)
Visual Prompt TuningVTAB-1k(Natural<7>)Mean Accuracy62.53SPT-Shallow(ViT-B/16_MAE_pretrained_ImageNet-1K)
Visual Prompt TuningVTAB-1k(Specialized<4>)Mean Accuracy84.95SPT-Deep(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)
Visual Prompt TuningVTAB-1k(Specialized<4>)Mean Accuracy83.93SPT-Shallow(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)
Visual Prompt TuningVTAB-1k(Specialized<4>)Mean Accuracy83.15SPT-Deep(ViT-B/16_MAE_pretrained_ImageNet-1K)
Visual Prompt TuningVTAB-1k(Specialized<4>)Mean Accuracy80.9SPT-Shallow(ViT-B/16_MAE_pretrained_ImageNet-1K)

Related Papers

Prompt Disentanglement via Language Guidance and Representation Alignment for Domain Generalization2025-07-03Attention to Burstiness: Low-Rank Bilinear Prompt Tuning2025-06-28DA-VPT: Semantic-Guided Visual Prompt Tuning for Vision Transformers2025-05-29Q-Adapt: Adapting LMM for Visual Quality Assessment with Progressive Instruction Tuning2025-04-02Visual Variational Autoencoder Prompt Tuning2025-03-22Iterative Prompt Relocation for Distribution-Adaptive Visual Prompt Tuning2025-03-10Exploring Interpretability for Visual Prompt Tuning with Hierarchical Concepts2025-03-08Adaptive Prompt: Unlocking the Power of Visual Prompt Tuning2025-01-31