Papers With Code 2 | ML Benchmarks, SotA Results & Code

A synthetically generated QA dataset for text-based reasoning. For each sample, composed of a True/False question over two pieces of information required to answer it (the context), we create multiple versions of different lengths by embedding the context parts within longer, irrelevant texts. To ensure that models utilize their entire input, the dataset is composed of tasks for which both pieces of information must reasoned over together in order to correctly answer the question. At the same time, we keep the tasks simple enough such that models answer most of them correctly when the information pieces are presented on their own, with no additional padding.

FlenQA