Reading Wikipedia to Answer Open-Domain Questions

Danqi Chen, Adam Fisch, Jason Weston, Antoine Bordes

2017-03-31ACL 2017 7Reading Comprehension Question Answering Open-Domain Question Answering Retrieval

Paper PDF Code Code Code Code Code Code Code Code(official)Code Code

Abstract

This paper proposes to tackle open- domain question answering using Wikipedia as the unique knowledge source: the answer to any factoid question is a text span in a Wikipedia article. This task of machine reading at scale combines the challenges of document retrieval (finding the relevant articles) with that of machine comprehension of text (identifying the answer spans from those articles). Our approach combines a search component based on bigram hashing and TF-IDF matching with a multi-layer recurrent neural network model trained to detect answers in Wikipedia paragraphs. Our experiments on multiple existing QA datasets indicate that (1) both modules are highly competitive with respect to existing counterparts and (2) multitask learning using distant supervision on their combination is an effective complete system on this challenging task.

Results

Task	Dataset	Metric	Value	Model
Question Answering	SQuAD1.1 dev	EM	69.5	DrQA (Document Reader only)
Question Answering	SQuAD1.1 dev	F1	78.8	DrQA (Document Reader only)
Question Answering	Quasart-T	EM	37.7	DrQA
Question Answering	SQuAD1.1	EM	70.733	Document Reader (single model)
Question Answering	SQuAD1.1	F1	79.353	Document Reader (single model)
Question Answering	Natural Questions (long)	F1	46.1	DrQA
Question Answering	SearchQA	EM	41.9	DrQA
Question Answering	SQuAD1.1	EM	70	DrQA
Open-Domain Question Answering	SearchQA	EM	41.9	DrQA
Open-Domain Question Answering	SQuAD1.1	EM	70	DrQA

Related Papers

From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17 Enter the Mind Palace: Reasoning and Planning for Long-term Active Embodied Question Answering2025-07-17 Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It2025-07-17 City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17 HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals2025-07-17 A Survey of Context Engineering for Large Language Models2025-07-17 MCoT-RE: Multi-Faceted Chain-of-Thought and Re-Ranking for Training-Free Zero-Shot Composed Image Retrieval2025-07-17 Describe Anything Model for Visual Question Answering on Text-rich Images2025-07-16