TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Learned in Translation: Contextualized Word Vectors

Learned in Translation: Contextualized Word Vectors

Bryan McCann, James Bradbury, Caiming Xiong, Richard Socher

2017-08-01NeurIPS 2017 12Text ClassificationMachine TranslationQuestion AnsweringSentiment AnalysisTranslationGeneral Classification
PaperPDFCodeCodeCodeCode(official)Code

Abstract

Computer vision has benefited from initializing multiple deep layers with weights pretrained on large supervised training sets like ImageNet. Natural language processing (NLP) typically sees initialization of only the lowest layer of deep models with pretrained word vectors. In this paper, we use a deep LSTM encoder from an attentional sequence-to-sequence model trained for machine translation (MT) to contextualize word vectors. We show that adding these context vectors (CoVe) improves performance over using only unsupervised word and character vectors on a wide variety of common NLP tasks: sentiment analysis (SST, IMDb), question classification (TREC), entailment (SNLI), and question answering (SQuAD). For fine-grained sentiment analysis and entailment, CoVe improves performance of our baseline models to the state of the art.

Results

TaskDatasetMetricValueModel
Question AnsweringSQuAD1.1 devEM71.3DCN (Char + CoVe)
Question AnsweringSQuAD1.1 devF179.9DCN (Char + CoVe)
Question AnsweringSQuAD1.1EM71.3DCN + Char + CoVe
Question AnsweringSQuAD1.1F179.9DCN + Char + CoVe
Natural Language InferenceSNLI% Test Accuracy88.1Biattentive Classification Network + CoVe + Char
Natural Language InferenceSNLI% Train Accuracy88.5Biattentive Classification Network + CoVe + Char
Sentiment AnalysisSST-5 Fine-grained classificationAccuracy53.7BCN+Char+CoVe
Sentiment AnalysisSST-2 Binary classificationAccuracy90.3BCN+Char+CoVe
Sentiment AnalysisIMDbAccuracy91.8BCN+Char+CoVe
Text ClassificationTREC-6Error4.2CoVe
ClassificationTREC-6Error4.2CoVe

Related Papers

Making Language Model a Hierarchical Classifier and Generator2025-07-17From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17Enter the Mind Palace: Reasoning and Planning for Long-term Active Embodied Question Answering2025-07-17Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It2025-07-17City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17AdaptiSent: Context-Aware Adaptive Attention for Multimodal Aspect-Based Sentiment Analysis2025-07-17A Translation of Probabilistic Event Calculus into Markov Decision Processes2025-07-17Describe Anything Model for Visual Question Answering on Text-rich Images2025-07-16