A Simple Method for Commonsense Reasoning

Trieu H. Trinh, Quoc V. Le

2018-06-07Coreference Resolution Common Sense Reasoning Natural Language Understanding Multiple-choice

Abstract

Commonsense reasoning is a long-standing challenge for deep learning. For example, it is difficult to use neural networks to tackle the Winograd Schema dataset (Levesque et al., 2011). In this paper, we present a simple method for commonsense reasoning with neural networks, using unsupervised learning. Key to our method is the use of language models, trained on a massive amount of unlabled data, to score multiple choice questions posed by commonsense reasoning tests. On both Pronoun Disambiguation and Winograd Schema challenges, our models outperform previous state-of-the-art methods by a large margin, without using expensive annotated knowledge bases or hand-engineered features. We train an array of large RNN language models that operate at word or character level on LM-1-Billion, CommonCrawl, SQuAD, Gutenberg Books, and a customized corpus for this task and show that diversity of training data plays an important role in test performance. Further analysis also shows that our system successfully discovers important features of the context that decide the correct answer, indicating a good grasp of commonsense knowledge.

Results

Task	Dataset	Metric	Value	Model
Coreference Resolution	Winograd Schema Challenge	Accuracy	63.7	Ensemble of 14 LMs
Coreference Resolution	Winograd Schema Challenge	Accuracy	62.6	Word-level CNN+LSTM (partial scoring)
Coreference Resolution	Winograd Schema Challenge	Accuracy	57.9	Char-level CNN+LSTM (partial scoring)
Natural Language Understanding	PDP60	Accuracy	60	Word-level CNN+LSTM (full scoring)
Natural Language Understanding	PDP60	Accuracy	53.3	Word-level CNN+LSTM (partial scoring)

Related Papers

Comparing Apples to Oranges: A Dataset & Analysis of LLM Humour Understanding from Traditional Puns to Topical Jokes2025-07-17 The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations2025-07-17 HATS: Hindi Analogy Test Set for Evaluating Reasoning in Large Language Models2025-07-17 Vision Language Action Models in Robotic Manipulation: A Systematic Review2025-07-14 LoSiA: Efficient High-Rank Fine-Tuning via Subnet Localization and Optimization2025-07-06 MateInfoUB: A Real-World Benchmark for Testing LLMs in Competitive, Multilingual, and Multimodal Educational Tasks2025-07-03 A Survey on Vision-Language-Action Models for Autonomous Driving2025-06-30 State and Memory is All You Need for Robust and Reliable AI Agents2025-06-30