HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering

Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William W. Cohen, Ruslan Salakhutdinov, Christopher D. Manning

2018-09-25EMNLP 2018 10Question Answering Multi-hop Question Answering

Abstract

Existing question answering (QA) datasets fail to train QA systems to perform complex reasoning and provide explanations for answers. We introduce HotpotQA, a new dataset with 113k Wikipedia-based question-answer pairs with four key features: (1) the questions require finding and reasoning over multiple supporting documents to answer; (2) the questions are diverse and not constrained to any pre-existing knowledge bases or knowledge schemas; (3) we provide sentence-level supporting facts required for reasoning, allowing QA systems to reason with strong supervision and explain the predictions; (4) we offer a new type of factoid comparison questions to test QA systems' ability to extract relevant facts and perform necessary comparison. We show that HotpotQA is challenging for the latest QA systems, and the supporting facts enable models to improve performance and make explainable predictions.

Results

Task	Dataset	Metric	Value	Model
Question Answering	HotpotQA	ANS-EM	0.589	SAFSR model
Question Answering	HotpotQA	ANS-F1	0.716	SAFSR model
Question Answering	HotpotQA	JOINT-EM	0.345	SAFSR model
Question Answering	HotpotQA	JOINT-F1	0.598	SAFSR model
Question Answering	HotpotQA	SUP-EM	0.48	SAFSR model
Question Answering	HotpotQA	SUP-F1	0.757	SAFSR model
Question Answering	HotpotQA	ANS-EM	0.24	Baseline Model
Question Answering	HotpotQA	ANS-F1	0.329	Baseline Model
Question Answering	HotpotQA	JOINT-EM	0.019	Baseline Model
Question Answering	HotpotQA	JOINT-F1	0.162	Baseline Model
Question Answering	HotpotQA	SUP-EM	0.039	Baseline Model
Question Answering	HotpotQA	SUP-F1	0.377	Baseline Model

HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering

Abstract

Results

Related Papers

HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering

Abstract

Results

Related Papers