TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Improved Language Modeling by Decoding the Past

Improved Language Modeling by Decoding the Past

Siddhartha Brahma

2018-08-14ACL 2019 7Language Modelling
PaperPDF

Abstract

Highly regularized LSTMs achieve impressive results on several benchmark datasets in language modeling. We propose a new regularization method based on decoding the last token in the context using the predicted distribution of the next token. This biases the model towards retaining more contextual information, in turn improving its ability to predict the next token. With negligible overhead in the number of parameters and training time, our Past Decode Regularization (PDR) method achieves a word level perplexity of 55.6 on the Penn Treebank and 63.5 on the WikiText-2 datasets using a single softmax. We also show gains by using PDR in combination with a mixture-of-softmaxes, achieving a word level perplexity of 53.8 and 60.5 on these datasets. In addition, our method achieves 1.169 bits-per-character on the Penn Treebank Character dataset for character level language modeling. These results constitute a new state-of-the-art in their respective settings.

Results

TaskDatasetMetricValueModel
Language ModellingPenn Treebank (Word Level)Test perplexity47.3Past Decode Reg. + AWD-LSTM-MoS + dyn. eval.
Language ModellingPenn Treebank (Word Level)Validation perplexity48Past Decode Reg. + AWD-LSTM-MoS + dyn. eval.
Language ModellingPenn Treebank (Character Level)Bit per Character (BPC)1.169Past Decode Reg. + AWD-LSTM-MoS + dyn. eval.
Language ModellingWikiText-2Test perplexity40.3Past Decode Reg. + AWD-LSTM-MoS + dyn. eval.
Language ModellingWikiText-2Validation perplexity42Past Decode Reg. + AWD-LSTM-MoS + dyn. eval.

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21Making Language Model a Hierarchical Classifier and Generator2025-07-17VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations2025-07-17Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities2025-07-17Assay2Mol: large language model-based drug design using BioAssay context2025-07-16Describe Anything Model for Visual Question Answering on Text-rich Images2025-07-16InstructFLIP: Exploring Unified Vision-Language Model for Face Anti-spoofing2025-07-16