Sho Takase, Jun Suzuki, Masaaki Nagata
This paper proposes a state-of-the-art recurrent neural network (RNN) language model that combines probability distributions computed not only from a final RNN layer but also from middle layers. Our proposed method raises the expressive power of a language model based on the matrix factorization interpretation of language modeling introduced by Yang et al. (2018). The proposed method improves the current state-of-the-art language model and achieves the best score on the Penn Treebank and WikiText-2, which are the standard benchmark datasets. Moreover, we indicate our proposed method contributes to two application tasks: machine translation and headline generation. Our code is publicly available at: https://github.com/nttcslab-nlp/doc_lm.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Language Modelling | Penn Treebank (Word Level) | Test perplexity | 47.17 | AWD-LSTM-DOC x5 |
| Language Modelling | Penn Treebank (Word Level) | Validation perplexity | 48.63 | AWD-LSTM-DOC x5 |
| Language Modelling | Penn Treebank (Word Level) | Test perplexity | 52.38 | AWD-LSTM-DOC |
| Language Modelling | Penn Treebank (Word Level) | Validation perplexity | 54.12 | AWD-LSTM-DOC |
| Language Modelling | WikiText-2 | Test perplexity | 53.09 | AWD-LSTM-DOC x5 |
| Language Modelling | WikiText-2 | Validation perplexity | 54.19 | AWD-LSTM-DOC x5 |
| Language Modelling | WikiText-2 | Test perplexity | 58.03 | AWD-LSTM-DOC |
| Language Modelling | WikiText-2 | Validation perplexity | 60.29 | AWD-LSTM-DOC |
| Constituency Parsing | Penn Treebank | F1 score | 94.47 | LSTM Encoder-Decoder + LSTM-LM |