TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Breaking the Softmax Bottleneck: A High-Rank RNN Language ...

Breaking the Softmax Bottleneck: A High-Rank RNN Language Model

Zhilin Yang, Zihang Dai, Ruslan Salakhutdinov, William W. Cohen

2017-11-10ICLR 2018 1Vocal Bursts Intensity PredictionWord EmbeddingsLanguage Modelling
PaperPDFCodeCodeCodeCodeCodeCodeCodeCodeCode(official)

Abstract

We formulate language modeling as a matrix factorization problem, and show that the expressiveness of Softmax-based models (including the majority of neural language models) is limited by a Softmax bottleneck. Given that natural language is highly context-dependent, this further implies that in practice Softmax with distributed word embeddings does not have enough capacity to model natural language. We propose a simple and effective method to address this issue, and improve the state-of-the-art perplexities on Penn Treebank and WikiText-2 to 47.69 and 40.68 respectively. The proposed method also excels on the large-scale 1B Word dataset, outperforming the baseline by over 5.6 points in perplexity.

Results

TaskDatasetMetricValueModel
Language ModellingPenn Treebank (Word Level)Test perplexity47.69AWD-LSTM-MoS + dynamic eval
Language ModellingPenn Treebank (Word Level)Validation perplexity48.33AWD-LSTM-MoS + dynamic eval
Language ModellingPenn Treebank (Word Level)Test perplexity54.44AWD-LSTM-MoS
Language ModellingPenn Treebank (Word Level)Validation perplexity56.54AWD-LSTM-MoS
Language ModellingWikiText-2Test perplexity40.68AWD-LSTM-MoS + dynamic eval
Language ModellingWikiText-2Validation perplexity42.41AWD-LSTM-MoS + dynamic eval
Language ModellingWikiText-2Test perplexity61.45AWD-LSTM-MoS
Language ModellingWikiText-2Validation perplexity63.88AWD-LSTM-MoS

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21Making Language Model a Hierarchical Classifier and Generator2025-07-17VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations2025-07-17Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities2025-07-17Assay2Mol: large language model-based drug design using BioAssay context2025-07-16Describe Anything Model for Visual Question Answering on Text-rich Images2025-07-16InstructFLIP: Exploring Unified Vision-Language Model for Face Anti-spoofing2025-07-16