Dense Passage Retrieval for Open-Domain Question Answering

Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, Wen-tau Yih

2020-04-10EMNLP 2020 11Question Answering Passage Retrieval Open-Domain Question Answering Retrieval

Paper PDF Code Code Code Code Code Code Code Code Code Code Code Code(official)Code Code Code Code Code Code Code

Abstract

Open-domain question answering relies on efficient passage retrieval to select candidate contexts, where traditional sparse vector space models, such as TF-IDF or BM25, are the de facto method. In this work, we show that retrieval can be practically implemented using dense representations alone, where embeddings are learned from a small number of questions and passages by a simple dual-encoder framework. When evaluated on a wide range of open-domain QA datasets, our dense retriever outperforms a strong Lucene-BM25 system largely by 9%-19% absolute in terms of top-20 passage retrieval accuracy, and helps our end-to-end QA system establish new state-of-the-art on multiple open-domain QA benchmarks.

Results

Task	Dataset	Metric	Value	Model
Question Answering	Natural Questions	EM	41.5	DPR
Question Answering	WebQuestions	EM	42.4	DPR
Question Answering	NaturalQA	EM	41.5	DPR
Question Answering	TriviaQA	EM	56.8	DPR
Information Retrieval	Natural Questions	Precision@100	86	DPR
Information Retrieval	Natural Questions	Precision@20	79.4	DPR

Related Papers

From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17 Enter the Mind Palace: Reasoning and Planning for Long-term Active Embodied Question Answering2025-07-17 Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It2025-07-17 City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17 HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals2025-07-17 A Survey of Context Engineering for Large Language Models2025-07-17 MCoT-RE: Multi-Faceted Chain-of-Thought and Re-Ranking for Training-Free Zero-Shot Composed Image Retrieval2025-07-17 Describe Anything Model for Visual Question Answering on Text-rich Images2025-07-16