TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/A Simple Method for Commonsense Reasoning

A Simple Method for Commonsense Reasoning

Trieu H. Trinh, Quoc V. Le

2018-06-07Coreference ResolutionCommon Sense ReasoningNatural Language UnderstandingMultiple-choice
PaperPDFCodeCode

Abstract

Commonsense reasoning is a long-standing challenge for deep learning. For example, it is difficult to use neural networks to tackle the Winograd Schema dataset (Levesque et al., 2011). In this paper, we present a simple method for commonsense reasoning with neural networks, using unsupervised learning. Key to our method is the use of language models, trained on a massive amount of unlabled data, to score multiple choice questions posed by commonsense reasoning tests. On both Pronoun Disambiguation and Winograd Schema challenges, our models outperform previous state-of-the-art methods by a large margin, without using expensive annotated knowledge bases or hand-engineered features. We train an array of large RNN language models that operate at word or character level on LM-1-Billion, CommonCrawl, SQuAD, Gutenberg Books, and a customized corpus for this task and show that diversity of training data plays an important role in test performance. Further analysis also shows that our system successfully discovers important features of the context that decide the correct answer, indicating a good grasp of commonsense knowledge.

Results

TaskDatasetMetricValueModel
Coreference ResolutionWinograd Schema ChallengeAccuracy63.7Ensemble of 14 LMs
Coreference ResolutionWinograd Schema ChallengeAccuracy62.6Word-level CNN+LSTM (partial scoring)
Coreference ResolutionWinograd Schema ChallengeAccuracy57.9Char-level CNN+LSTM (partial scoring)
Natural Language UnderstandingPDP60Accuracy60Word-level CNN+LSTM (full scoring)
Natural Language UnderstandingPDP60Accuracy53.3Word-level CNN+LSTM (partial scoring)

Related Papers

Comparing Apples to Oranges: A Dataset & Analysis of LLM Humour Understanding from Traditional Puns to Topical Jokes2025-07-17The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations2025-07-17HATS: Hindi Analogy Test Set for Evaluating Reasoning in Large Language Models2025-07-17Vision Language Action Models in Robotic Manipulation: A Systematic Review2025-07-14LoSiA: Efficient High-Rank Fine-Tuning via Subnet Localization and Optimization2025-07-06MateInfoUB: A Real-World Benchmark for Testing LLMs in Competitive, Multilingual, and Multimodal Educational Tasks2025-07-03A Survey on Vision-Language-Action Models for Autonomous Driving2025-06-30State and Memory is All You Need for Robust and Reliable AI Agents2025-06-30