TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Improving Neural Language Modeling via Adversarial Training

Improving Neural Language Modeling via Adversarial Training

Dilin Wang, Chengyue Gong, Qiang Liu

2019-06-10Machine TranslationTranslationLanguage Modelling
PaperPDFCode(official)

Abstract

Recently, substantial progress has been made in language modeling by using deep neural networks. However, in practice, large scale neural language models have been shown to be prone to overfitting. In this paper, we present a simple yet highly effective adversarial training mechanism for regularizing neural language models. The idea is to introduce adversarial noise to the output embedding layer while training the models. We show that the optimal adversarial noise yields a simple closed-form solution, thus allowing us to develop a simple and time efficient algorithm. Theoretically, we show that our adversarial mechanism effectively encourages the diversity of the embedding vectors, helping to increase the robustness of models. Empirically, we show that our method improves on the single model state-of-the-art results for language modeling on Penn Treebank (PTB) and Wikitext-2, achieving test perplexity scores of 46.01 and 38.07, respectively. When applied to machine translation, our method improves over various transformer-based translation baselines in BLEU scores on the WMT14 English-German and IWSLT14 German-English tasks.

Results

TaskDatasetMetricValueModel
Machine TranslationWMT2014 English-GermanBLEU score29.52Transformer Big + adversarial MLE
Language ModellingPenn Treebank (Word Level)Test perplexity46.01adversarial + AWD-LSTM-MoS + dynamic eval
Language ModellingPenn Treebank (Word Level)Validation perplexity46.63adversarial + AWD-LSTM-MoS + dynamic eval
Language ModellingWikiText-103Test perplexity28AdvSoft (+ 4 layer QRNN + dynamic eval)
Language ModellingWikiText-103Validation perplexity27.2AdvSoft (+ 4 layer QRNN + dynamic eval)
Language ModellingWikiText-2Test perplexity38.65adversarial + AWD-LSTM-MoS + dynamic eval
Language ModellingWikiText-2Validation perplexity40.27adversarial + AWD-LSTM-MoS + dynamic eval

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21A Translation of Probabilistic Event Calculus into Markov Decision Processes2025-07-17Making Language Model a Hierarchical Classifier and Generator2025-07-17VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations2025-07-17Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities2025-07-17Assay2Mol: large language model-based drug design using BioAssay context2025-07-16Describe Anything Model for Visual Question Answering on Text-rich Images2025-07-16