Deep contextualized word representations

Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, Luke Zettlemoyer

2018-02-15NAACL 2018 6Question Answering Only Connect Walls Dataset Task 1 (Grouping)Sentiment Analysis Coreference Resolution Natural Language Inference Semantic Role Labeling Conversational Response Selection Named Entity Recognition (NER)Citation Intent Classification Language Modelling

Paper PDF Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code Code

Abstract

We introduce a new type of deep contextualized word representation that models both (1) complex characteristics of word use (e.g., syntax and semantics), and (2) how these uses vary across linguistic contexts (i.e., to model polysemy). Our word vectors are learned functions of the internal states of a deep bidirectional language model (biLM), which is pre-trained on a large text corpus. We show that these representations can be easily added to existing models and significantly improve the state of the art across six challenging NLP problems, including question answering, textual entailment and sentiment analysis. We also present an analysis showing that exposing the deep internals of the pre-trained network is crucial, allowing downstream models to mix different types of semi-supervision signals.

Results

Task	Dataset	Metric	Value	Model
Question Answering	SQuAD1.1 dev	F1	85.6	BiDAF + Self Attention + ELMo
Question Answering	SQuAD1.1	EM	81.003	BiDAF + Self Attention + ELMo (ensemble)
Question Answering	SQuAD1.1	F1	87.432	BiDAF + Self Attention + ELMo (ensemble)
Question Answering	SQuAD1.1	EM	81.003	BiDAF + Self Attention + ELMo (ensemble)
Question Answering	SQuAD1.1	F1	87.432	BiDAF + Self Attention + ELMo (ensemble)
Question Answering	SQuAD1.1	EM	78.58	BiDAF + Self Attention + ELMo (single model)
Question Answering	SQuAD1.1	F1	85.833	BiDAF + Self Attention + ELMo (single model)
Question Answering	SQuAD1.1	EM	78.58	BiDAF + Self Attention + ELMo (single model)
Question Answering	SQuAD1.1	F1	85.833	BiDAF + Self Attention + ELMo (single model)
Question Answering	SQuAD2.0	EM	63.372	BiDAF + Self Attention + ELMo (single model)
Question Answering	SQuAD2.0	F1	66.251	BiDAF + Self Attention + ELMo (single model)
Question Answering	SQuAD2.0	EM	63.372	BiDAF + Self Attention + ELMo (single model)
Question Answering	SQuAD2.0	F1	66.251	BiDAF + Self Attention + ELMo (single model)
Word Sense Disambiguation	Supervised:	SemEval 2007	62.2	ELMo
Word Sense Disambiguation	Supervised:	SemEval 2013	66.2	ELMo
Word Sense Disambiguation	Supervised:	SemEval 2015	71.3	ELMo
Word Sense Disambiguation	Supervised:	Senseval 2	71.6	ELMo
Word Sense Disambiguation	Supervised:	Senseval 3	69.6	ELMo
Natural Language Inference	SNLI	% Test Accuracy	89.3	ESIM + ELMo Ensemble
Natural Language Inference	SNLI	% Train Accuracy	92.1	ESIM + ELMo Ensemble
Natural Language Inference	SNLI	% Test Accuracy	88.7	ESIM + ELMo
Natural Language Inference	SNLI	% Train Accuracy	91.6	ESIM + ELMo
Semantic Role Labeling	OntoNotes	F1	84.6	He et al., 2017 + ELMo
Sentiment Analysis	SST-5 Fine-grained classification	Accuracy	54.7	BCN+ELMo
Named Entity Recognition (NER)	CoNLL 2003 (English)	F1	92.22	BiLSTM-CRF+ELMo
Named Entity Recognition (NER)	CoNLL++	F1	93.42	BiLSTM-CRF+ELMo
Text Classification	ACL-ARC	Macro-F1	54.6	BiLSTM-Attention + ELMo
Classification	ACL-ARC	Macro-F1	54.6	BiLSTM-Attention + ELMo

Deep contextualized word representations

Abstract

Results

Related Papers

Deep contextualized word representations

Abstract

Results

Related Papers