TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Efficiently Learning at Test-Time: Active Fine-Tuning of L...

Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs

Jonas Hübotter, Sascha Bongni, Ido Hakimi, Andreas Krause

2024-10-10Active LearningRetrievalLanguage Modelling
PaperPDFCode(official)

Abstract

Recent efforts in fine-tuning language models often rely on automatic data selection, commonly using Nearest Neighbors retrieval from large datasets. However, we theoretically show that this approach tends to select redundant data, limiting its effectiveness or even hurting performance. To address this, we introduce SIFT, a data selection algorithm designed to reduce uncertainty about the model's response given a prompt, which unifies ideas from retrieval and active learning. Whereas Nearest Neighbor retrieval typically fails in the presence of information duplication, SIFT accounts for information duplication and optimizes the overall information gain of the selected examples. We focus our evaluations on fine-tuning at test-time for prompt-specific language modeling on the Pile dataset, and show that SIFT consistently outperforms Nearest Neighbor retrieval, with minimal computational overhead. Moreover, we show that our uncertainty estimates can predict the performance gain of test-time fine-tuning, and use this to develop an adaptive algorithm that invests test-time compute proportional to realized performance gains. We provide the $\texttt{activeft}$ (Active Fine-Tuning) library which can be used as a drop-in replacement for Nearest Neighbor retrieval.

Results

TaskDatasetMetricValueModel
Language ModellingThe PileBits per byte0.557Test-Time Fine-Tuning with SIFT + Llama-3.2 (3B)
Language ModellingThe PileBits per byte0.595Test-Time Fine-Tuning with SIFT + Phi-3 (3.8B)
Language ModellingThe PileBits per byte0.606Test-Time Fine-Tuning with SIFT + Llama-3.2 (1B)
Language ModellingThe PileBits per byte0.629Gemma-2 27B
Language ModellingThe PileBits per byte0.64Llama-3.2 3B
Language ModellingThe PileBits per byte0.651Phi-3 14B
Language ModellingThe PileBits per byte0.67Gemma-2 9B
Language ModellingThe PileBits per byte0.678Phi-3 7B
Language ModellingThe PileBits per byte0.679Phi-3 3.8B
Language ModellingThe PileBits per byte0.697Llama-3.2 1B
Language ModellingThe PileBits per byte0.721Gemma-2 2B
Language ModellingThe PileBits per byte0.737Llama-3.2-Instruct 3B
Language ModellingThe PileBits per byte0.762Test-Time Fine-Tuning with SIFT + GPT-2 (774M)
Language ModellingThe PileBits per byte0.807Llama-3.2-Instruct 1B
Language ModellingThe PileBits per byte0.862Test-Time Fine-Tuning with SIFT + GPT-2 (124M)

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals2025-07-17A Survey of Context Engineering for Large Language Models2025-07-17MCoT-RE: Multi-Faceted Chain-of-Thought and Re-Ranking for Training-Free Zero-Shot Composed Image Retrieval2025-07-17Making Language Model a Hierarchical Classifier and Generator2025-07-17VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations2025-07-17