Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Medical
/
Language Modelling
/
Penn Treebank (Word Level)
Language Modelling on Penn Treebank (Word Level)
Metric: Test perplexity (lower is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
Sort:
Test perplexity (best first)
Test perplexity (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
Test perplexity
▲
Extra Data
Paper
Date
↕
Code
1
GPT-3 (Zero-Shot)
20.5
Yes
Language Models are Few-Shot Learners
2020-05-28
Code
2
BERT-Large-CAS
31.3
Yes
Language Models with Transformers
2019-04-20
Code
3
GPT-2
35.76
Yes
-
-
Code
4
Mogrifier LSTM + dynamic eval
44.9
No
Mogrifier LSTM
2019-09-04
Code
5
adversarial + AWD-LSTM-MoS + dynamic eval
46.01
No
Improving Neural Language Modeling via Adversari...
2019-06-10
Code
6
GL-LWGC + AWD-MoS-LSTM + dynamic eval
46.34
No
Gradual Learning of Recurrent Neural Networks
2017-08-29
Code
7
FRAGE + AWD-LSTM-MoS + dynamic eval
46.54
No
FRAGE: Frequency-Agnostic Word Representation
2018-09-18
Code
8
AWD-LSTM-DOC x5
47.17
No
Direct Output Connection for a High-Rank Languag...
2018-08-30
Code
9
Past Decode Reg. + AWD-LSTM-MoS + dyn. eval.
47.3
No
Improved Language Modeling by Decoding the Past
2018-08-14
-
10
Ensemble of All
47.31
No
Advancing State of the Art in Language Modeling
2023-11-28
Code
11
AWD-LSTM-MoS + dynamic eval
47.69
No
Breaking the Softmax Bottleneck: A High-Rank RNN...
2017-11-10
Code
12
AWD-LSTM-DRILL + dynamic eval
49.4
No
Deep Residual Output Layers for Neural Language ...
2019-05-14
Code
13
Dense IndRNN+dynamic eval
50.97
No
Deep Independently Recurrent Neural Network (Ind...
2019-10-11
Code
14
AWD-LSTM + dynamic eval
51.1
No
Dynamic Evaluation of Neural Sequence Models
2017-09-21
Code
15
AWD-LSTM-DOC + Partial Shuffle
52
No
Partially Shuffling the Training Data to Improve...
2019-03-11
Code
16
AWD-LSTM-DOC
52.38
No
Direct Output Connection for a High-Rank Languag...
2018-08-30
Code
17
AWD-LSTM + continuous cache pointer
52.8
No
Regularizing and Optimizing LSTM Language Models
2017-08-07
Code
18
AWD-LSTM-MoS + Partial Shuffle
53.92
No
Partially Shuffling the Training Data to Improve...
2019-03-11
Code
19
Trellis Network
54.19
No
Trellis Networks for Sequence Modeling
2018-10-15
Code
20
AWD-LSTM-MoS
54.44
No
Breaking the Softmax Bottleneck: A High-Rank RNN...
2017-11-10
Code
21
AWD-FWM Schlag et al. (2020)
54.48
No
Learning Associative Inference Using Fast Weight...
2020-11-16
Code
22
Transformer-XL
54.55
No
Transformer-XL: Attentive Language Models Beyond...
2019-01-09
Code
23
Transformer-XL + AutoDropout
54.9
No
AutoDropout: Learning Dropout Patterns to Regula...
2021-01-05
Code
24
2-layer skip-LSTM + dropout tuning
55.3
No
Pushing the bounds of dropout
2018-05-23
Code
25
AWD-LSTM-DRILL
55.7
No
Deep Residual Output Layers for Neural Language ...
2019-05-14
Code
26
Differentiable NAS
56.1
No
DARTS: Differentiable Architecture Search
2018-06-24
Code
27
Dense IndRNN
56.37
No
Deep Independently Recurrent Neural Network (Ind...
2019-10-11
Code
28
AWD-LSTM 3-layer with Fraternal dropout
56.8
No
Fraternal Dropout
2017-10-31
Code
29
DEQ-TrellisNet
57.1
No
Deep Equilibrium Models
2019-09-03
Code
30
AWD-LSTM
57.3
No
Regularizing and Optimizing LSTM Language Models
2017-08-07
Code
31
Efficient NAS
58.6
No
Efficient Neural Architecture Search via Paramet...
2018-02-09
Code
32
NAS-RL
64
No
Neural Architecture Search with Reinforcement Le...
2016-11-05
Code
33
Recurrent highway networks
65.4
No
Recurrent Highway Networks
2016-07-12
Code
34
Inan et al. (2016) - Variational RHN
66
No
Tying Word Vectors and Word Classifiers: A Loss ...
2016-11-04
Code
35
Gal & Ghahramani (2016) - Variational LSTM (large)
75.2
No
A Theoretically Grounded Application of Dropout ...
2015-12-16
Code
36
Zaremba et al. (2014) - LSTM (large)
78.4
No
Recurrent Neural Network Regularization
2014-09-08
Code
37
LSTM (Bai et al., 2018)
78.93
No
An Empirical Evaluation of Generic Convolutional...
2018-03-04
Code
38
Gal & Ghahramani (2016) - Variational LSTM (medium)
79.7
No
A Theoretically Grounded Application of Dropout ...
2015-12-16
Code
39
Zaremba et al. (2014) - LSTM (medium)
82.7
No
Recurrent Neural Network Regularization
2014-09-08
Code
40
R-Transformer
84.38
No
R-Transformer: Recurrent Neural Network Enhanced...
2019-07-12
Code
41
GRU (Bai et al., 2018)
92.48
No
An Empirical Evaluation of Generic Convolutional...
2018-03-04
Code
42
Seq-U-Net
107.95
No
Seq-U-Net: A One-Dimensional Causal U-Net for Ef...
2019-11-14
Code
43
TCN
108.47
No
Seq-U-Net: A One-Dimensional Causal U-Net for Ef...
2019-11-14
Code
#1
GPT-3 (Zero-Shot)
SOTA
20.5
Test perplexity
· Extra Data
· 2020-05-28
Language Models are Few-Shot Learners
Code
#2
BERT-Large-CAS
SOTA
31.3
Test perplexity
· Extra Data
· 2019-04-20
Language Models with Transformers
Code
#3
GPT-2
35.76
Test perplexity
· Extra Data
No paper
Code
#4
Mogrifier LSTM + dynamic eval
44.9
Test perplexity
· 2019-09-04
Mogrifier LSTM
Code
#5
adversarial + AWD-LSTM-MoS + dynamic eval
46.01
Test perplexity
· 2019-06-10
Improving Neural Language Modeling via Adversarial Training
Code
#6
GL-LWGC + AWD-MoS-LSTM + dynamic eval
SOTA
46.34
Test perplexity
· 2017-08-29
Gradual Learning of Recurrent Neural Networks
Code
#7
FRAGE + AWD-LSTM-MoS + dynamic eval
46.54
Test perplexity
· 2018-09-18
FRAGE: Frequency-Agnostic Word Representation
Code
#8
AWD-LSTM-DOC x5
47.17
Test perplexity
· 2018-08-30
Direct Output Connection for a High-Rank Language Model
Code
#9
Past Decode Reg. + AWD-LSTM-MoS + dyn. eval.
47.3
Test perplexity
· 2018-08-14
Improved Language Modeling by Decoding the Past
#10
Ensemble of All
47.31
Test perplexity
· 2023-11-28
Advancing State of the Art in Language Modeling
Code
#11
AWD-LSTM-MoS + dynamic eval
47.69
Test perplexity
· 2017-11-10
Breaking the Softmax Bottleneck: A High-Rank RNN Language Model
Code
#12
AWD-LSTM-DRILL + dynamic eval
49.4
Test perplexity
· 2019-05-14
Deep Residual Output Layers for Neural Language Generation
Code
#13
Dense IndRNN+dynamic eval
50.97
Test perplexity
· 2019-10-11
Deep Independently Recurrent Neural Network (IndRNN)
Code
#14
AWD-LSTM + dynamic eval
51.1
Test perplexity
· 2017-09-21
Dynamic Evaluation of Neural Sequence Models
Code
#15
AWD-LSTM-DOC + Partial Shuffle
52
Test perplexity
· 2019-03-11
Partially Shuffling the Training Data to Improve Language Models
Code
#16
AWD-LSTM-DOC
52.38
Test perplexity
· 2018-08-30
Direct Output Connection for a High-Rank Language Model
Code
#17
AWD-LSTM + continuous cache pointer
SOTA
52.8
Test perplexity
· 2017-08-07
Regularizing and Optimizing LSTM Language Models
Code
#18
AWD-LSTM-MoS + Partial Shuffle
53.92
Test perplexity
· 2019-03-11
Partially Shuffling the Training Data to Improve Language Models
Code
#19
Trellis Network
54.19
Test perplexity
· 2018-10-15
Trellis Networks for Sequence Modeling
Code
#20
AWD-LSTM-MoS
54.44
Test perplexity
· 2017-11-10
Breaking the Softmax Bottleneck: A High-Rank RNN Language Model
Code
#21
AWD-FWM Schlag et al. (2020)
54.48
Test perplexity
· 2020-11-16
Learning Associative Inference Using Fast Weight Memory
Code
#22
Transformer-XL
54.55
Test perplexity
· 2019-01-09
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Code
#23
Transformer-XL + AutoDropout
54.9
Test perplexity
· 2021-01-05
AutoDropout: Learning Dropout Patterns to Regularize Deep Networks
Code
#24
2-layer skip-LSTM + dropout tuning
55.3
Test perplexity
· 2018-05-23
Pushing the bounds of dropout
Code
#25
AWD-LSTM-DRILL
55.7
Test perplexity
· 2019-05-14
Deep Residual Output Layers for Neural Language Generation
Code
#26
Differentiable NAS
56.1
Test perplexity
· 2018-06-24
DARTS: Differentiable Architecture Search
Code
#27
Dense IndRNN
56.37
Test perplexity
· 2019-10-11
Deep Independently Recurrent Neural Network (IndRNN)
Code
#28
AWD-LSTM 3-layer with Fraternal dropout
56.8
Test perplexity
· 2017-10-31
Fraternal Dropout
Code
#29
DEQ-TrellisNet
57.1
Test perplexity
· 2019-09-03
Deep Equilibrium Models
Code
#30
AWD-LSTM
SOTA
57.3
Test perplexity
· 2017-08-07
Regularizing and Optimizing LSTM Language Models
Code
#31
Efficient NAS
58.6
Test perplexity
· 2018-02-09
Efficient Neural Architecture Search via Parameter Sharing
Code
#32
NAS-RL
SOTA
64
Test perplexity
· 2016-11-05
Neural Architecture Search with Reinforcement Learning
Code
#33
Recurrent highway networks
SOTA
65.4
Test perplexity
· 2016-07-12
Recurrent Highway Networks
Code
#34
Inan et al. (2016) - Variational RHN
66
Test perplexity
· 2016-11-04
Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling
Code
#35
Gal & Ghahramani (2016) - Variational LSTM (large)
SOTA
75.2
Test perplexity
· 2015-12-16
A Theoretically Grounded Application of Dropout in Recurrent Neural Networks
Code
#36
Zaremba et al. (2014) - LSTM (large)
SOTA
78.4
Test perplexity
· 2014-09-08
Recurrent Neural Network Regularization
Code
#37
LSTM (Bai et al., 2018)
78.93
Test perplexity
· 2018-03-04
An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling
Code
#38
Gal & Ghahramani (2016) - Variational LSTM (medium)
79.7
Test perplexity
· 2015-12-16
A Theoretically Grounded Application of Dropout in Recurrent Neural Networks
Code
#39
Zaremba et al. (2014) - LSTM (medium)
SOTA
82.7
Test perplexity
· 2014-09-08
Recurrent Neural Network Regularization
Code
#40
R-Transformer
84.38
Test perplexity
· 2019-07-12
R-Transformer: Recurrent Neural Network Enhanced Transformer
Code
#41
GRU (Bai et al., 2018)
92.48
Test perplexity
· 2018-03-04
An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling
Code
#42
Seq-U-Net
107.95
Test perplexity
· 2019-11-14
Seq-U-Net: A One-Dimensional Causal U-Net for Efficient Sequence Modelling
Code
#43
TCN
108.47
Test perplexity
· 2019-11-14
Seq-U-Net: A One-Dimensional Causal U-Net for Efficient Sequence Modelling
Code