Attention Is (not) All You Need for Commonsense Reasoning

Tassilo Klein, Moin Nabi

2019-05-31ACL 2019 7Coreference Resolution Natural Language Understanding All

Abstract

The recently introduced BERT model exhibits strong performance on several language understanding benchmarks. In this paper, we describe a simple re-implementation of BERT for commonsense reasoning. We show that the attentions produced by BERT can be directly utilized for tasks such as the Pronoun Disambiguation Problem and Winograd Schema Challenge. Our proposed attention-guided commonsense reasoning method is conceptually simple yet empirically powerful. Experimental analysis on multiple datasets demonstrates that our proposed system performs remarkably well on all cases while outperforming the previously reported state of the art by a margin. While results suggest that BERT seems to implicitly learn to establish complex relationships between entities, solving commonsense reasoning tasks might require more than unsupervised models learned from huge text corpora.

Results

Task	Dataset	Metric	Value	Model
Coreference Resolution	Winograd Schema Challenge	Accuracy	60.3	BERT-base 110M + MAS
Coreference Resolution	Winograd Schema Challenge	Accuracy	52.8	USSM + Supervised DeepNet + KB
Coreference Resolution	Winograd Schema Challenge	Accuracy	52	USSM + KB
Natural Language Understanding	PDP60	Accuracy	68.3	BERT-base 110M + MAS
Natural Language Understanding	PDP60	Accuracy	66.7	USSM + Supervised Deepnet + 3 Knowledge Bases
Natural Language Understanding	PDP60	Accuracy	53.3	USSM + Supervised Deepnet

Attention Is (not) All You Need for Commonsense Reasoning

Abstract

Results

Related Papers

Attention Is (not) All You Need for Commonsense Reasoning

Abstract

Results

Related Papers