Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Medical
/
Language Modelling
/
Penn Treebank (Word Level)
Language Modelling on Penn Treebank (Word Level)
Metric: Validation perplexity (lower is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
#
Model
↕
Validation perplexity
▲
Extra Data
Paper
Date
↕
Code
1
BERT-Large-CAS
36.1
Yes
Language Models with Transformers
2019-04-20
Code
2
Mogrifier LSTM + dynamic eval
44.8
No
Mogrifier LSTM
2019-09-04
Code
3
adversarial + AWD-LSTM-MoS + dynamic eval
46.63
No
Improving Neural Language Modeling via Adversari...
2019-06-10
Code
4
GL-LWGC + AWD-MoS-LSTM + dynamic eval
46.64
No
Gradual Learning of Recurrent Neural Networks
2017-08-29
Code
5
FRAGE + AWD-LSTM-MoS + dynamic eval
47.38
No
FRAGE: Frequency-Agnostic Word Representation
2018-09-18
Code
6
Past Decode Reg. + AWD-LSTM-MoS + dyn. eval.
48
No
Improved Language Modeling by Decoding the Past
2018-08-14
-
7
AWD-LSTM-MoS + dynamic eval
48.33
No
Breaking the Softmax Bottleneck: A High-Rank RNN...
2017-11-10
Code
8
AWD-LSTM-DOC x5
48.63
No
Direct Output Connection for a High-Rank Languag...
2018-08-30
Code
9
Ensemble of All
48.92
No
Advancing State of the Art in Language Modeling
2023-11-28
Code
10
AWD-LSTM-DRILL + dynamic eval
49.5
No
Deep Residual Output Layers for Neural Language ...
2019-05-14
Code
11
AWD-LSTM + dynamic eval
51.6
No
Dynamic Evaluation of Neural Sequence Models
2017-09-21
Code
12
AWD-LSTM-DOC + Partial Shuffle
53.79
No
Partially Shuffling the Training Data to Improve...
2019-03-11
Code
13
AWD-LSTM + continuous cache pointer
53.9
No
Regularizing and Optimizing LSTM Language Models
2017-08-07
Code
14
AWD-LSTM-DOC
54.12
No
Direct Output Connection for a High-Rank Languag...
2018-08-30
Code
15
AWD-LSTM-MoS + Partial Shuffle
55.89
No
Partially Shuffling the Training Data to Improve...
2019-03-11
Code
16
AWD-LSTM-MoS
56.54
No
Breaking the Softmax Bottleneck: A High-Rank RNN...
2017-11-10
Code
17
Transformer-XL
56.72
No
Transformer-XL: Attentive Language Models Beyond...
2019-01-09
Code
18
AWD-FWM Schlag et al. (2020)
56.76
No
Learning Associative Inference Using Fast Weight...
2020-11-16
Code
19
2-layer skip-LSTM + dropout tuning
57.1
No
Pushing the bounds of dropout
2018-05-23
Code
20
Transformer-XL + AutoDropout
58.1
No
AutoDropout: Learning Dropout Patterns to Regula...
2021-01-05
Code
21
AWD-LSTM-DRILL
58.2
No
Deep Residual Output Layers for Neural Language ...
2019-05-14
Code
22
Differentiable NAS
58.3
No
DARTS: Differentiable Architecture Search
2018-06-24
Code
23
AWD-LSTM 3-layer with Fraternal dropout
58.9
No
Fraternal Dropout
2017-10-31
Code
24
AWD-LSTM
60
No
Regularizing and Optimizing LSTM Language Models
2017-08-07
Code
25
Efficient NAS
60.8
No
Efficient Neural Architecture Search via Paramet...
2018-02-09
Code
26
Recurrent highway networks
67.9
No
Recurrent Highway Networks
2016-07-12
Code
27
Inan et al. (2016) - Variational RHN
68.1
No
Tying Word Vectors and Word Classifiers: A Loss ...
2016-11-04
Code
28
Gal & Ghahramani (2016) - Variational LSTM (large)
77.9
No
A Theoretically Grounded Application of Dropout ...
2015-12-16
Code
29
Gal & Ghahramani (2016) - Variational LSTM (medium)
81.9
No
A Theoretically Grounded Application of Dropout ...
2015-12-16
Code
30
Zaremba et al. (2014) - LSTM (large)
82.2
No
Recurrent Neural Network Regularization
2014-09-08
Code
31
Zaremba et al. (2014) - LSTM (medium)
86.2
No
Recurrent Neural Network Regularization
2014-09-08
Code