MS MARCO

Microsoft Machine Reading Comprehension Dataset

TextsCustom (research-only, non-commercial)Introduced 2016-01-01

The MS MARCO (Microsoft MAchine Reading Comprehension) is a collection of datasets focused on deep learning in search. The first dataset was a question answering dataset featuring 100,000 real Bing questions and a human generated answer. Over time the collection was extended with a 1,000,000 question dataset, a natural language generation dataset, a passage ranking dataset, keyphrase extraction dataset, crawling dataset, and a conversational search.

Source: https://microsoft.github.io/msmarco/ Image Source: https://arxiv.org/pdf/1809.08267.pdf