Direct Output Connection for a High-Rank Language Model

Sho Takase, Jun Suzuki, Masaaki Nagata

2018-08-30EMNLP 2018 10Machine Translation Vocal Bursts Intensity Prediction Headline Generation Constituency Parsing Translation Language Modelling

Paper PDF Code(official)

Abstract

This paper proposes a state-of-the-art recurrent neural network (RNN) language model that combines probability distributions computed not only from a final RNN layer but also from middle layers. Our proposed method raises the expressive power of a language model based on the matrix factorization interpretation of language modeling introduced by Yang et al. (2018). The proposed method improves the current state-of-the-art language model and achieves the best score on the Penn Treebank and WikiText-2, which are the standard benchmark datasets. Moreover, we indicate our proposed method contributes to two application tasks: machine translation and headline generation. Our code is publicly available at: https://github.com/nttcslab-nlp/doc_lm.

Results

Task	Dataset	Metric	Value	Model
Language Modelling	Penn Treebank (Word Level)	Test perplexity	47.17	AWD-LSTM-DOC x5
Language Modelling	Penn Treebank (Word Level)	Validation perplexity	48.63	AWD-LSTM-DOC x5
Language Modelling	Penn Treebank (Word Level)	Test perplexity	52.38	AWD-LSTM-DOC
Language Modelling	Penn Treebank (Word Level)	Validation perplexity	54.12	AWD-LSTM-DOC
Language Modelling	WikiText-2	Test perplexity	53.09	AWD-LSTM-DOC x5
Language Modelling	WikiText-2	Validation perplexity	54.19	AWD-LSTM-DOC x5
Language Modelling	WikiText-2	Test perplexity	58.03	AWD-LSTM-DOC
Language Modelling	WikiText-2	Validation perplexity	60.29	AWD-LSTM-DOC
Constituency Parsing	Penn Treebank	F1 score	94.47	LSTM Encoder-Decoder + LSTM-LM

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21 A Translation of Probabilistic Event Calculus into Markov Decision Processes2025-07-17 Making Language Model a Hierarchical Classifier and Generator2025-07-17 VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17 The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations2025-07-17 Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities2025-07-17 Assay2Mol: large language model-based drug design using BioAssay context2025-07-16 Describe Anything Model for Visual Question Answering on Text-rich Images2025-07-16