Improving Neural Language Modeling via Adversarial Training

Dilin Wang, Chengyue Gong, Qiang Liu

2019-06-10Machine Translation Translation Language Modelling

Abstract

Recently, substantial progress has been made in language modeling by using deep neural networks. However, in practice, large scale neural language models have been shown to be prone to overfitting. In this paper, we present a simple yet highly effective adversarial training mechanism for regularizing neural language models. The idea is to introduce adversarial noise to the output embedding layer while training the models. We show that the optimal adversarial noise yields a simple closed-form solution, thus allowing us to develop a simple and time efficient algorithm. Theoretically, we show that our adversarial mechanism effectively encourages the diversity of the embedding vectors, helping to increase the robustness of models. Empirically, we show that our method improves on the single model state-of-the-art results for language modeling on Penn Treebank (PTB) and Wikitext-2, achieving test perplexity scores of 46.01 and 38.07, respectively. When applied to machine translation, our method improves over various transformer-based translation baselines in BLEU scores on the WMT14 English-German and IWSLT14 German-English tasks.

Results

Task	Dataset	Metric	Value	Model
Machine Translation	WMT2014 English-German	BLEU score	29.52	Transformer Big + adversarial MLE
Language Modelling	Penn Treebank (Word Level)	Test perplexity	46.01	adversarial + AWD-LSTM-MoS + dynamic eval
Language Modelling	Penn Treebank (Word Level)	Validation perplexity	46.63	adversarial + AWD-LSTM-MoS + dynamic eval
Language Modelling	WikiText-103	Test perplexity	28	AdvSoft (+ 4 layer QRNN + dynamic eval)
Language Modelling	WikiText-103	Validation perplexity	27.2	AdvSoft (+ 4 layer QRNN + dynamic eval)
Language Modelling	WikiText-2	Test perplexity	38.65	adversarial + AWD-LSTM-MoS + dynamic eval
Language Modelling	WikiText-2	Validation perplexity	40.27	adversarial + AWD-LSTM-MoS + dynamic eval

Improving Neural Language Modeling via Adversarial Training

Abstract

Results

Related Papers

Improving Neural Language Modeling via Adversarial Training

Abstract

Results

Related Papers