Papers With Code 2 | ML Benchmarks, SotA Results & Code

Visual Prompt Tuning(VPT) only introduces a small amount of task-specific learnable parameters into the input space while freezing the entire pre-trained Transformer backbone during downstream training. In practice, these additional parameters are simply prepended into the input sequence of each Transformer layer and learned together with a linear head during fine-tuning. VPT is especially effective in the low-data regime, and maintains its advantage across data scales. Finally, VPT is competitive for a range of Transformer scales and designs (ViTBase/Large/Huge, Swin). Put together, the results suggest that VPT is one of the most effective ways of adapting ever-growing vision backbones.

Visual Prompt Tuning

Benchmarks

Visual Prompt Tuning on FGVC

Visual Prompt Tuning on VTAB-1k(Natural<7>)

Visual Prompt Tuning on VTAB-1k(Specialized<4>)

Visual Prompt Tuning on VTAB-1k(Structured<8>)