REPLUG: Retrieval-Augmented Black-Box Language Models

Weijia Shi, Sewon Min, Michihiro Yasunaga, Minjoon Seo, Rich James, Mike Lewis, Luke Zettlemoyer, Wen-tau Yih

2023-01-30Question Answering Multi-task Language Understanding Retrieval MMLU Language Modelling

Abstract

We introduce REPLUG, a retrieval-augmented language modeling framework that treats the language model (LM) as a black box and augments it with a tuneable retrieval model. Unlike prior retrieval-augmented LMs that train language models with special cross attention mechanisms to encode the retrieved text, REPLUG simply prepends retrieved documents to the input for the frozen black-box LM. This simple design can be easily applied to any existing retrieval and language models. Furthermore, we show that the LM can be used to supervise the retrieval model, which can then find documents that help the LM make better predictions. Our experiments demonstrate that REPLUG with the tuned retriever significantly improves the performance of GPT-3 (175B) on language modeling by 6.3%, as well as the performance of Codex on five-shot MMLU by 5.1%.

Results

Task	Dataset	Metric	Value	Model
Question Answering	Natural Questions	EM	45.5	code-davinci-002 175B + REPLUG LSR (few-shot)
Question Answering	Natural Questions	EM	44.7	code-davinci-002 175B + REPLUG (few-shot)
Question Answering	TriviaQA	EM	77.3	code-davinci-002 175B + REPLUG LSR (Few-Shot)
Question Answering	TriviaQA	EM	76.8	code-davinci-002 175B + REPLUG (Few-Shot)

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21 From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17 Enter the Mind Palace: Reasoning and Planning for Long-term Active Embodied Question Answering2025-07-17 Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It2025-07-17 City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17 HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals2025-07-17 A Survey of Context Engineering for Large Language Models2025-07-17 MCoT-RE: Multi-Faceted Chain-of-Thought and Re-Ranking for Training-Free Zero-Shot Composed Image Retrieval2025-07-17