Exploring the Limits of Language Modeling

Rafal Jozefowicz, Oriol Vinyals, Mike Schuster, Noam Shazeer, Yonghui Wu

2016-02-07Language Modelling

Paper PDF Code Code Code Code Code Code Code Code Code Code

Abstract

In this work we explore recent advances in Recurrent Neural Networks for large scale Language Modeling, a task central to language understanding. We extend current models to deal with two key challenges present in this task: corpora and vocabulary sizes, and complex, long term structure of language. We perform an exhaustive study on techniques such as character Convolutional Neural Networks or Long-Short Term Memory, on the One Billion Word Benchmark. Our best single model significantly improves state-of-the-art perplexity from 51.3 down to 30.0 (whilst reducing the number of parameters by a factor of 20), while an ensemble of models sets a new record by improving perplexity from 41.0 down to 23.7. We also release these models for the NLP and ML community to study and improve upon.

Results

Task	Dataset	Metric	Value	Model
Language Modelling	One Billion Word	PPL	23.7	10 LSTM+CNN inputs + SNM10-SKIP (ensemble)
Language Modelling	One Billion Word	PPL	30	LSTM-8192-1024 + CNN Input
Language Modelling	One Billion Word	PPL	30.6	LSTM-8192-1024

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21 Making Language Model a Hierarchical Classifier and Generator2025-07-17 VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17 The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations2025-07-17 Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities2025-07-17 Assay2Mol: large language model-based drug design using BioAssay context2025-07-16 Describe Anything Model for Visual Question Answering on Text-rich Images2025-07-16 InstructFLIP: Exploring Unified Vision-Language Model for Face Anti-spoofing2025-07-16