NewsQA

TextsCustomIntroduced 2017-01-01

The NewsQA dataset is a crowd-sourced machine reading comprehension dataset of 120,000 question-answer pairs.

  • Documents are CNN news articles.
  • Questions are written by human users in natural language.
  • Answers may be multiword passages of the source text.
  • Questions may be unanswerable.
  • NewsQA is collected using a 3-stage, siloed process.
  • Questioners see only an article’s headline and highlights.
  • Answerers see the question and the full article, then select an answer passage.
  • Validators see the article, the question, and a set of answers that they rank.
  • NewsQA is more natural and more challenging than previous datasets.

Source: https://www.microsoft.com/en-us/research/project/newsqa-dataset/ Image Source: Trischler et al