TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/ReClor: A Reading Comprehension Dataset Requiring Logical ...

ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning

Weihao Yu, Zi-Hang Jiang, Yanfei Dong, Jiashi Feng

2020-02-11ICLR 2020 1Reading ComprehensionQuestion AnsweringLogical ReasoningLogical Reasoning Question AnsweringLogical Reasoning Reading ComprehensionMachine Reading Comprehension
PaperPDFCode(official)

Abstract

Recent powerful pre-trained language models have achieved remarkable performance on most of the popular datasets for reading comprehension. It is time to introduce more challenging datasets to push the development of this field towards more comprehensive reasoning of text. In this paper, we introduce a new Reading Comprehension dataset requiring logical reasoning (ReClor) extracted from standardized graduate admission examinations. As earlier studies suggest, human-annotated datasets usually contain biases, which are often exploited by models to achieve high accuracy without truly understanding the text. In order to comprehensively evaluate the logical reasoning ability of models on ReClor, we propose to identify biased data points and separate them into EASY set while the rest as HARD set. Empirical results show that state-of-the-art models have an outstanding ability to capture biases contained in the dataset with high accuracy on EASY set. However, they struggle on HARD set with poor performance near that of random guess, indicating more research is needed to essentially enhance the logical reasoning ability of current models.

Results

TaskDatasetMetricValueModel
Reading ComprehensionReClorTest56XLNet-large
Reading ComprehensionReClorTest55.6RoBERTa-large
Reading ComprehensionReClorTest50.4XLNet-base
Reading ComprehensionReClorTest49.8BERT-large
Reading ComprehensionReClorTest48.5RoBERTa-base
Reading ComprehensionReClorTest47.3BERT-base
Reading ComprehensionReClorAccuracy56XLNet-large
Reading ComprehensionReClorAccuracy (easy)75.7XLNet-large
Reading ComprehensionReClorAccuracy (hard)40.5XLNet-large
Reading ComprehensionReClorAccuracy55.6RoBERTa-large
Reading ComprehensionReClorAccuracy (easy)75.5RoBERTa-large
Reading ComprehensionReClorAccuracy (hard)40RoBERTa-large
Reading ComprehensionReClorAccuracy49.8BERT-large
Reading ComprehensionReClorAccuracy (easy)72BERT-large
Reading ComprehensionReClorAccuracy (hard)32.3BERT-large
Question AnsweringReClorAccuracy56XLNet-large
Question AnsweringReClorAccuracy (easy)75.7XLNet-large
Question AnsweringReClorAccuracy (hard)40.5XLNet-large
Question AnsweringReClorAccuracy55.6RoBERTa-large
Question AnsweringReClorAccuracy (easy)75.5RoBERTa-large
Question AnsweringReClorAccuracy (hard)40RoBERTa-large
Question AnsweringReClorAccuracy49.8BERT-large
Question AnsweringReClorAccuracy (easy)72BERT-large
Question AnsweringReClorAccuracy (hard)32.3BERT-large
Question AnsweringReClorAccuracy56XLNet-large
Question AnsweringReClorAccuracy (easy)75.7XLNet-large
Question AnsweringReClorAccuracy (hard)40.5XLNet-large
Question AnsweringReClorAccuracy55.6RoBERTa-large
Question AnsweringReClorAccuracy (easy)75.5RoBERTa-large
Question AnsweringReClorAccuracy (hard)40RoBERTa-large
Question AnsweringReClorAccuracy49.8BERT-large
Question AnsweringReClorAccuracy (easy)72BERT-large
Question AnsweringReClorAccuracy (hard)32.3BERT-large
Visual Question Answering (VQA)ReClorAccuracy56XLNet-large
Visual Question Answering (VQA)ReClorAccuracy (easy)75.7XLNet-large
Visual Question Answering (VQA)ReClorAccuracy (hard)40.5XLNet-large
Visual Question Answering (VQA)ReClorAccuracy55.6RoBERTa-large
Visual Question Answering (VQA)ReClorAccuracy (easy)75.5RoBERTa-large
Visual Question Answering (VQA)ReClorAccuracy (hard)40RoBERTa-large
Visual Question Answering (VQA)ReClorAccuracy49.8BERT-large
Visual Question Answering (VQA)ReClorAccuracy (easy)72BERT-large
Visual Question Answering (VQA)ReClorAccuracy (hard)32.3BERT-large

Related Papers

From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17Enter the Mind Palace: Reasoning and Planning for Long-term Active Embodied Question Answering2025-07-17Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It2025-07-17City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17Describe Anything Model for Visual Question Answering on Text-rich Images2025-07-16Is This Just Fantasy? Language Model Representations Reflect Human Judgments of Event Plausibility2025-07-16Warehouse Spatial Question Answering with LLM Agent2025-07-14Evaluating Attribute Confusion in Fashion Text-to-Image Generation2025-07-09