Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs

Jonas Hübotter, Sascha Bongni, Ido Hakimi, Andreas Krause

2024-10-10Active Learning Retrieval Language Modelling

Abstract

Recent efforts in fine-tuning language models often rely on automatic data selection, commonly using Nearest Neighbors retrieval from large datasets. However, we theoretically show that this approach tends to select redundant data, limiting its effectiveness or even hurting performance. To address this, we introduce SIFT, a data selection algorithm designed to reduce uncertainty about the model's response given a prompt, which unifies ideas from retrieval and active learning. Whereas Nearest Neighbor retrieval typically fails in the presence of information duplication, SIFT accounts for information duplication and optimizes the overall information gain of the selected examples. We focus our evaluations on fine-tuning at test-time for prompt-specific language modeling on the Pile dataset, and show that SIFT consistently outperforms Nearest Neighbor retrieval, with minimal computational overhead. Moreover, we show that our uncertainty estimates can predict the performance gain of test-time fine-tuning, and use this to develop an adaptive algorithm that invests test-time compute proportional to realized performance gains. We provide the $\texttt{activeft}$ (Active Fine-Tuning) library which can be used as a drop-in replacement for Nearest Neighbor retrieval.

Results

Task	Dataset	Metric	Value	Model
Language Modelling	The Pile	Bits per byte	0.557	Test-Time Fine-Tuning with SIFT + Llama-3.2 (3B)
Language Modelling	The Pile	Bits per byte	0.595	Test-Time Fine-Tuning with SIFT + Phi-3 (3.8B)
Language Modelling	The Pile	Bits per byte	0.606	Test-Time Fine-Tuning with SIFT + Llama-3.2 (1B)
Language Modelling	The Pile	Bits per byte	0.629	Gemma-2 27B
Language Modelling	The Pile	Bits per byte	0.64	Llama-3.2 3B
Language Modelling	The Pile	Bits per byte	0.651	Phi-3 14B
Language Modelling	The Pile	Bits per byte	0.67	Gemma-2 9B
Language Modelling	The Pile	Bits per byte	0.678	Phi-3 7B
Language Modelling	The Pile	Bits per byte	0.679	Phi-3 3.8B
Language Modelling	The Pile	Bits per byte	0.697	Llama-3.2 1B
Language Modelling	The Pile	Bits per byte	0.721	Gemma-2 2B
Language Modelling	The Pile	Bits per byte	0.737	Llama-3.2-Instruct 3B
Language Modelling	The Pile	Bits per byte	0.762	Test-Time Fine-Tuning with SIFT + GPT-2 (774M)
Language Modelling	The Pile	Bits per byte	0.807	Llama-3.2-Instruct 1B
Language Modelling	The Pile	Bits per byte	0.862	Test-Time Fine-Tuning with SIFT + GPT-2 (124M)

Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs

Abstract

Results

Related Papers

Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs

Abstract

Results

Related Papers