Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Medical
/
Language Modelling
/
Penn Treebank (Word Level)
Language Modelling on Penn Treebank (Word Level)
Metric: Validation perplexity (lower is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
Sort:
Validation perplexity (best first)
Validation perplexity (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
Validation perplexity
▲
Extra Data
Paper
Date
↕
Code
1
BERT-Large-CAS
36.1
Yes
Language Models with Transformers
2019-04-20
Code
2
Mogrifier LSTM + dynamic eval
44.8
No
Mogrifier LSTM
2019-09-04
Code
3
adversarial + AWD-LSTM-MoS + dynamic eval
46.63
No
Improving Neural Language Modeling via Adversari...
2019-06-10
Code
4
GL-LWGC + AWD-MoS-LSTM + dynamic eval
46.64
No
Gradual Learning of Recurrent Neural Networks
2017-08-29
Code
5
FRAGE + AWD-LSTM-MoS + dynamic eval
47.38
No
FRAGE: Frequency-Agnostic Word Representation
2018-09-18
Code
6
Past Decode Reg. + AWD-LSTM-MoS + dyn. eval.
48
No
Improved Language Modeling by Decoding the Past
2018-08-14
-
7
AWD-LSTM-MoS + dynamic eval
48.33
No
Breaking the Softmax Bottleneck: A High-Rank RNN...
2017-11-10
Code
8
AWD-LSTM-DOC x5
48.63
No
Direct Output Connection for a High-Rank Languag...
2018-08-30
Code
9
Ensemble of All
48.92
No
Advancing State of the Art in Language Modeling
2023-11-28
Code
10
AWD-LSTM-DRILL + dynamic eval
49.5
No
Deep Residual Output Layers for Neural Language ...
2019-05-14
Code
11
AWD-LSTM + dynamic eval
51.6
No
Dynamic Evaluation of Neural Sequence Models
2017-09-21
Code
12
AWD-LSTM-DOC + Partial Shuffle
53.79
No
Partially Shuffling the Training Data to Improve...
2019-03-11
Code
13
AWD-LSTM + continuous cache pointer
53.9
No
Regularizing and Optimizing LSTM Language Models
2017-08-07
Code
14
AWD-LSTM-DOC
54.12
No
Direct Output Connection for a High-Rank Languag...
2018-08-30
Code
15
AWD-LSTM-MoS + Partial Shuffle
55.89
No
Partially Shuffling the Training Data to Improve...
2019-03-11
Code
16
AWD-LSTM-MoS
56.54
No
Breaking the Softmax Bottleneck: A High-Rank RNN...
2017-11-10
Code
17
Transformer-XL
56.72
No
Transformer-XL: Attentive Language Models Beyond...
2019-01-09
Code
18
AWD-FWM Schlag et al. (2020)
56.76
No
Learning Associative Inference Using Fast Weight...
2020-11-16
Code
19
2-layer skip-LSTM + dropout tuning
57.1
No
Pushing the bounds of dropout
2018-05-23
Code
20
Transformer-XL + AutoDropout
58.1
No
AutoDropout: Learning Dropout Patterns to Regula...
2021-01-05
Code
21
AWD-LSTM-DRILL
58.2
No
Deep Residual Output Layers for Neural Language ...
2019-05-14
Code
22
Differentiable NAS
58.3
No
DARTS: Differentiable Architecture Search
2018-06-24
Code
23
AWD-LSTM 3-layer with Fraternal dropout
58.9
No
Fraternal Dropout
2017-10-31
Code
24
AWD-LSTM
60
No
Regularizing and Optimizing LSTM Language Models
2017-08-07
Code
25
Efficient NAS
60.8
No
Efficient Neural Architecture Search via Paramet...
2018-02-09
Code
26
Recurrent highway networks
67.9
No
Recurrent Highway Networks
2016-07-12
Code
27
Inan et al. (2016) - Variational RHN
68.1
No
Tying Word Vectors and Word Classifiers: A Loss ...
2016-11-04
Code
28
Gal & Ghahramani (2016) - Variational LSTM (large)
77.9
No
A Theoretically Grounded Application of Dropout ...
2015-12-16
Code
29
Gal & Ghahramani (2016) - Variational LSTM (medium)
81.9
No
A Theoretically Grounded Application of Dropout ...
2015-12-16
Code
30
Zaremba et al. (2014) - LSTM (large)
82.2
No
Recurrent Neural Network Regularization
2014-09-08
Code
31
Zaremba et al. (2014) - LSTM (medium)
86.2
No
Recurrent Neural Network Regularization
2014-09-08
Code
#1
BERT-Large-CAS
SOTA
36.1
Validation perplexity
· Extra Data
· 2019-04-20
Language Models with Transformers
Code
#2
Mogrifier LSTM + dynamic eval
44.8
Validation perplexity
· 2019-09-04
Mogrifier LSTM
Code
#3
adversarial + AWD-LSTM-MoS + dynamic eval
46.63
Validation perplexity
· 2019-06-10
Improving Neural Language Modeling via Adversarial Training
Code
#4
GL-LWGC + AWD-MoS-LSTM + dynamic eval
SOTA
46.64
Validation perplexity
· 2017-08-29
Gradual Learning of Recurrent Neural Networks
Code
#5
FRAGE + AWD-LSTM-MoS + dynamic eval
47.38
Validation perplexity
· 2018-09-18
FRAGE: Frequency-Agnostic Word Representation
Code
#6
Past Decode Reg. + AWD-LSTM-MoS + dyn. eval.
48
Validation perplexity
· 2018-08-14
Improved Language Modeling by Decoding the Past
#7
AWD-LSTM-MoS + dynamic eval
48.33
Validation perplexity
· 2017-11-10
Breaking the Softmax Bottleneck: A High-Rank RNN Language Model
Code
#8
AWD-LSTM-DOC x5
48.63
Validation perplexity
· 2018-08-30
Direct Output Connection for a High-Rank Language Model
Code
#9
Ensemble of All
48.92
Validation perplexity
· 2023-11-28
Advancing State of the Art in Language Modeling
Code
#10
AWD-LSTM-DRILL + dynamic eval
49.5
Validation perplexity
· 2019-05-14
Deep Residual Output Layers for Neural Language Generation
Code
#11
AWD-LSTM + dynamic eval
51.6
Validation perplexity
· 2017-09-21
Dynamic Evaluation of Neural Sequence Models
Code
#12
AWD-LSTM-DOC + Partial Shuffle
53.79
Validation perplexity
· 2019-03-11
Partially Shuffling the Training Data to Improve Language Models
Code
#13
AWD-LSTM + continuous cache pointer
SOTA
53.9
Validation perplexity
· 2017-08-07
Regularizing and Optimizing LSTM Language Models
Code
#14
AWD-LSTM-DOC
54.12
Validation perplexity
· 2018-08-30
Direct Output Connection for a High-Rank Language Model
Code
#15
AWD-LSTM-MoS + Partial Shuffle
55.89
Validation perplexity
· 2019-03-11
Partially Shuffling the Training Data to Improve Language Models
Code
#16
AWD-LSTM-MoS
56.54
Validation perplexity
· 2017-11-10
Breaking the Softmax Bottleneck: A High-Rank RNN Language Model
Code
#17
Transformer-XL
56.72
Validation perplexity
· 2019-01-09
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Code
#18
AWD-FWM Schlag et al. (2020)
56.76
Validation perplexity
· 2020-11-16
Learning Associative Inference Using Fast Weight Memory
Code
#19
2-layer skip-LSTM + dropout tuning
57.1
Validation perplexity
· 2018-05-23
Pushing the bounds of dropout
Code
#20
Transformer-XL + AutoDropout
58.1
Validation perplexity
· 2021-01-05
AutoDropout: Learning Dropout Patterns to Regularize Deep Networks
Code
#21
AWD-LSTM-DRILL
58.2
Validation perplexity
· 2019-05-14
Deep Residual Output Layers for Neural Language Generation
Code
#22
Differentiable NAS
58.3
Validation perplexity
· 2018-06-24
DARTS: Differentiable Architecture Search
Code
#23
AWD-LSTM 3-layer with Fraternal dropout
58.9
Validation perplexity
· 2017-10-31
Fraternal Dropout
Code
#24
AWD-LSTM
SOTA
60
Validation perplexity
· 2017-08-07
Regularizing and Optimizing LSTM Language Models
Code
#25
Efficient NAS
60.8
Validation perplexity
· 2018-02-09
Efficient Neural Architecture Search via Parameter Sharing
Code
#26
Recurrent highway networks
SOTA
67.9
Validation perplexity
· 2016-07-12
Recurrent Highway Networks
Code
#27
Inan et al. (2016) - Variational RHN
68.1
Validation perplexity
· 2016-11-04
Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling
Code
#28
Gal & Ghahramani (2016) - Variational LSTM (large)
SOTA
77.9
Validation perplexity
· 2015-12-16
A Theoretically Grounded Application of Dropout in Recurrent Neural Networks
Code
#29
Gal & Ghahramani (2016) - Variational LSTM (medium)
SOTA
81.9
Validation perplexity
· 2015-12-16
A Theoretically Grounded Application of Dropout in Recurrent Neural Networks
Code
#30
Zaremba et al. (2014) - LSTM (large)
SOTA
82.2
Validation perplexity
· 2014-09-08
Recurrent Neural Network Regularization
Code
#31
Zaremba et al. (2014) - LSTM (medium)
SOTA
86.2
Validation perplexity
· 2014-09-08
Recurrent Neural Network Regularization
Code