TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/HotpotQA: A Dataset for Diverse, Explainable Multi-hop Que...

HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering

Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William W. Cohen, Ruslan Salakhutdinov, Christopher D. Manning

2018-09-25EMNLP 2018 10Question AnsweringMulti-hop Question Answering
PaperPDFCodeCode(official)

Abstract

Existing question answering (QA) datasets fail to train QA systems to perform complex reasoning and provide explanations for answers. We introduce HotpotQA, a new dataset with 113k Wikipedia-based question-answer pairs with four key features: (1) the questions require finding and reasoning over multiple supporting documents to answer; (2) the questions are diverse and not constrained to any pre-existing knowledge bases or knowledge schemas; (3) we provide sentence-level supporting facts required for reasoning, allowing QA systems to reason with strong supervision and explain the predictions; (4) we offer a new type of factoid comparison questions to test QA systems' ability to extract relevant facts and perform necessary comparison. We show that HotpotQA is challenging for the latest QA systems, and the supporting facts enable models to improve performance and make explainable predictions.

Results

TaskDatasetMetricValueModel
Question AnsweringHotpotQAANS-EM0.589SAFSR model
Question AnsweringHotpotQAANS-F10.716SAFSR model
Question AnsweringHotpotQAJOINT-EM0.345SAFSR model
Question AnsweringHotpotQAJOINT-F10.598SAFSR model
Question AnsweringHotpotQASUP-EM0.48SAFSR model
Question AnsweringHotpotQASUP-F10.757SAFSR model
Question AnsweringHotpotQAANS-EM0.24Baseline Model
Question AnsweringHotpotQAANS-F10.329Baseline Model
Question AnsweringHotpotQAJOINT-EM0.019Baseline Model
Question AnsweringHotpotQAJOINT-F10.162Baseline Model
Question AnsweringHotpotQASUP-EM0.039Baseline Model
Question AnsweringHotpotQASUP-F10.377Baseline Model

Related Papers

From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17Enter the Mind Palace: Reasoning and Planning for Long-term Active Embodied Question Answering2025-07-17Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It2025-07-17City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17Describe Anything Model for Visual Question Answering on Text-rich Images2025-07-16Is This Just Fantasy? Language Model Representations Reflect Human Judgments of Event Plausibility2025-07-16Warehouse Spatial Question Answering with LLM Agent2025-07-14Evaluating Attribute Confusion in Fashion Text-to-Image Generation2025-07-09