Learned in Translation: Contextualized Word Vectors

Bryan McCann, James Bradbury, Caiming Xiong, Richard Socher

2017-08-01NeurIPS 2017 12Text Classification Machine Translation Question Answering Sentiment Analysis Translation General Classification

Paper PDF Code Code Code Code(official)Code

Abstract

Computer vision has benefited from initializing multiple deep layers with weights pretrained on large supervised training sets like ImageNet. Natural language processing (NLP) typically sees initialization of only the lowest layer of deep models with pretrained word vectors. In this paper, we use a deep LSTM encoder from an attentional sequence-to-sequence model trained for machine translation (MT) to contextualize word vectors. We show that adding these context vectors (CoVe) improves performance over using only unsupervised word and character vectors on a wide variety of common NLP tasks: sentiment analysis (SST, IMDb), question classification (TREC), entailment (SNLI), and question answering (SQuAD). For fine-grained sentiment analysis and entailment, CoVe improves performance of our baseline models to the state of the art.

Results

Task	Dataset	Metric	Value	Model
Question Answering	SQuAD1.1 dev	EM	71.3	DCN (Char + CoVe)
Question Answering	SQuAD1.1 dev	F1	79.9	DCN (Char + CoVe)
Question Answering	SQuAD1.1	EM	71.3	DCN + Char + CoVe
Question Answering	SQuAD1.1	F1	79.9	DCN + Char + CoVe
Natural Language Inference	SNLI	% Test Accuracy	88.1	Biattentive Classification Network + CoVe + Char
Natural Language Inference	SNLI	% Train Accuracy	88.5	Biattentive Classification Network + CoVe + Char
Sentiment Analysis	SST-5 Fine-grained classification	Accuracy	53.7	BCN+Char+CoVe
Sentiment Analysis	SST-2 Binary classification	Accuracy	90.3	BCN+Char+CoVe
Sentiment Analysis	IMDb	Accuracy	91.8	BCN+Char+CoVe
Text Classification	TREC-6	Error	4.2	CoVe
Classification	TREC-6	Error	4.2	CoVe

Learned in Translation: Contextualized Word Vectors

Abstract

Results

Related Papers

Learned in Translation: Contextualized Word Vectors

Abstract

Results

Related Papers