TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Deep contextualized word representations

Deep contextualized word representations

Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, Luke Zettlemoyer

2018-02-15NAACL 2018 6Question AnsweringOnly Connect Walls Dataset Task 1 (Grouping)Sentiment AnalysisCoreference ResolutionNatural Language InferenceSemantic Role LabelingConversational Response SelectionNamed Entity Recognition (NER)Citation Intent ClassificationLanguage Modelling
PaperPDFCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCode

Abstract

We introduce a new type of deep contextualized word representation that models both (1) complex characteristics of word use (e.g., syntax and semantics), and (2) how these uses vary across linguistic contexts (i.e., to model polysemy). Our word vectors are learned functions of the internal states of a deep bidirectional language model (biLM), which is pre-trained on a large text corpus. We show that these representations can be easily added to existing models and significantly improve the state of the art across six challenging NLP problems, including question answering, textual entailment and sentiment analysis. We also present an analysis showing that exposing the deep internals of the pre-trained network is crucial, allowing downstream models to mix different types of semi-supervision signals.

Results

TaskDatasetMetricValueModel
Question AnsweringSQuAD1.1 devF185.6BiDAF + Self Attention + ELMo
Question AnsweringSQuAD1.1EM81.003BiDAF + Self Attention + ELMo (ensemble)
Question AnsweringSQuAD1.1F187.432BiDAF + Self Attention + ELMo (ensemble)
Question AnsweringSQuAD1.1EM81.003BiDAF + Self Attention + ELMo (ensemble)
Question AnsweringSQuAD1.1F187.432BiDAF + Self Attention + ELMo (ensemble)
Question AnsweringSQuAD1.1EM78.58BiDAF + Self Attention + ELMo (single model)
Question AnsweringSQuAD1.1F185.833BiDAF + Self Attention + ELMo (single model)
Question AnsweringSQuAD1.1EM78.58BiDAF + Self Attention + ELMo (single model)
Question AnsweringSQuAD1.1F185.833BiDAF + Self Attention + ELMo (single model)
Question AnsweringSQuAD2.0EM63.372BiDAF + Self Attention + ELMo (single model)
Question AnsweringSQuAD2.0F166.251BiDAF + Self Attention + ELMo (single model)
Question AnsweringSQuAD2.0EM63.372BiDAF + Self Attention + ELMo (single model)
Question AnsweringSQuAD2.0F166.251BiDAF + Self Attention + ELMo (single model)
Word Sense DisambiguationSupervised:SemEval 200762.2ELMo
Word Sense DisambiguationSupervised:SemEval 201366.2ELMo
Word Sense DisambiguationSupervised:SemEval 201571.3ELMo
Word Sense DisambiguationSupervised:Senseval 271.6ELMo
Word Sense DisambiguationSupervised:Senseval 369.6ELMo
Natural Language InferenceSNLI% Test Accuracy89.3ESIM + ELMo Ensemble
Natural Language InferenceSNLI% Train Accuracy92.1ESIM + ELMo Ensemble
Natural Language InferenceSNLI% Test Accuracy88.7ESIM + ELMo
Natural Language InferenceSNLI% Train Accuracy91.6ESIM + ELMo
Semantic Role LabelingOntoNotesF184.6He et al., 2017 + ELMo
Sentiment AnalysisSST-5 Fine-grained classificationAccuracy54.7BCN+ELMo
Named Entity Recognition (NER)CoNLL 2003 (English)F192.22BiLSTM-CRF+ELMo
Named Entity Recognition (NER)CoNLL++F193.42BiLSTM-CRF+ELMo
Text ClassificationACL-ARCMacro-F154.6BiLSTM-Attention + ELMo
ClassificationACL-ARCMacro-F154.6BiLSTM-Attention + ELMo

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17Enter the Mind Palace: Reasoning and Planning for Long-term Active Embodied Question Answering2025-07-17Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It2025-07-17City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17AdaptiSent: Context-Aware Adaptive Attention for Multimodal Aspect-Based Sentiment Analysis2025-07-17Making Language Model a Hierarchical Classifier and Generator2025-07-17VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17