TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Few-Shot Parameter-Efficient Fine-Tuning is Better and Che...

Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning

Haokun Liu, Derek Tam, Mohammed Muqeeth, Jay Mohta, Tenghao Huang, Mohit Bansal, Colin Raffel

2022-05-11parameter-efficient fine-tuningFew-Shot Text Classification
PaperPDFCode(official)Code

Abstract

Few-shot in-context learning (ICL) enables pre-trained language models to perform a previously-unseen task without any gradient-based training by feeding a small number of training examples as part of the input. ICL incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made. Parameter-efficient fine-tuning (PEFT) (e.g. adapter modules, prompt tuning, sparse update methods, etc.) offers an alternative paradigm where a small set of parameters are trained to enable a model to perform the new task. In this paper, we rigorously compare few-shot ICL and PEFT and demonstrate that the latter offers better accuracy as well as dramatically lower computational costs. Along the way, we introduce a new PEFT method called (IA)$^3$ that scales activations by learned vectors, attaining stronger performance while only introducing a relatively tiny amount of new parameters. We also propose a simple recipe based on the T0 model called T-Few that can be applied to new tasks without task-specific tuning or modifications. We validate the effectiveness of T-Few on completely unseen tasks by applying it to the RAFT benchmark, attaining super-human performance for the first time and outperforming the state-of-the-art by 6% absolute. All of the code used in our experiments is publicly available.

Results

TaskDatasetMetricValueModel
Text ClassificationRAFT Over0.95T-Few
Text ClassificationRAFTADE0.804T-Few
Text ClassificationRAFTAvg0.758T-Few
Text ClassificationRAFTB770.695T-Few
Text ClassificationRAFTNIS0.833T-Few
Text ClassificationRAFTOSE0.676T-Few
Text ClassificationRAFTSOT0.915T-Few
Text ClassificationRAFTSRI0.508T-Few
Text ClassificationRAFTTAI0.736T-Few
Text ClassificationRAFTTC0.879T-Few
Text ClassificationRAFTTEH0.586T-Few
Text ClassificationRAFTToS0.75T-Few
Few-Shot Text ClassificationRAFT Over0.95T-Few
Few-Shot Text ClassificationRAFTADE0.804T-Few
Few-Shot Text ClassificationRAFTAvg0.758T-Few
Few-Shot Text ClassificationRAFTB770.695T-Few
Few-Shot Text ClassificationRAFTNIS0.833T-Few
Few-Shot Text ClassificationRAFTOSE0.676T-Few
Few-Shot Text ClassificationRAFTSOT0.915T-Few
Few-Shot Text ClassificationRAFTSRI0.508T-Few
Few-Shot Text ClassificationRAFTTAI0.736T-Few
Few-Shot Text ClassificationRAFTTC0.879T-Few
Few-Shot Text ClassificationRAFTTEH0.586T-Few
Few-Shot Text ClassificationRAFTToS0.75T-Few
ClassificationRAFT Over0.95T-Few
ClassificationRAFTADE0.804T-Few
ClassificationRAFTAvg0.758T-Few
ClassificationRAFTB770.695T-Few
ClassificationRAFTNIS0.833T-Few
ClassificationRAFTOSE0.676T-Few
ClassificationRAFTSOT0.915T-Few
ClassificationRAFTSRI0.508T-Few
ClassificationRAFTTAI0.736T-Few
ClassificationRAFTTC0.879T-Few
ClassificationRAFTTEH0.586T-Few
ClassificationRAFTToS0.75T-Few

Related Papers

Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy2025-07-17LoSiA: Efficient High-Rank Fine-Tuning via Subnet Localization and Optimization2025-07-06Exploring Adapter Design Tradeoffs for Low Resource Music Generation2025-06-26WordCon: Word-level Typography Control in Scene Text Rendering2025-06-26Optimising Language Models for Downstream Tasks: A Post-Training Perspective2025-06-26Progtuning: Progressive Fine-tuning Framework for Transformer-based Language Models2025-06-26Detecting Referring Expressions in Visually Grounded Dialogue with Autoregressive Language Models2025-06-26ARD-LoRA: Dynamic Rank Allocation for Parameter-Efficient Fine-Tuning of Foundation Models with Heterogeneous Adaptation Needs2025-06-23