Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Medical
/
Language Modelling
/
WikiText-2
Language Modelling on WikiText-2
Metric: Validation perplexity (lower is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
#
Model
↕
Validation perplexity
▲
Extra Data
Paper
Date
↕
Code
1
GPT-2 (fine-tuned)
15.69
Yes
Hydra: A System for Large Multi-Model Deep Learn...
2021-10-16
Code
2
BERT-Large-CAS
37.7
Yes
Language Models with Transformers
2019-04-20
Code
3
Mogrifier LSTM + dynamic eval
40.2
No
Mogrifier LSTM
2019-09-04
Code
4
adversarial + AWD-LSTM-MoS + dynamic eval
40.27
No
Improving Neural Language Modeling via Adversari...
2019-06-10
Code
5
FRAGE + AWD-LSTM-MoS + dynamic eval
40.85
No
FRAGE: Frequency-Agnostic Word Representation
2018-09-18
Code
6
Past Decode Reg. + AWD-LSTM-MoS + dyn. eval.
42
No
Improved Language Modeling by Decoding the Past
2018-08-14
-
7
GL-LWGC + AWD-MoS-LSTM + dynamic eval
42.19
No
Gradual Learning of Recurrent Neural Networks
2017-08-29
Code
8
AWD-LSTM-MoS + dynamic eval
42.41
No
Breaking the Softmax Bottleneck: A High-Rank RNN...
2017-11-10
Code
9
AWD-LSTM-DRILL + dynamic eval
43.9
No
Deep Residual Output Layers for Neural Language ...
2019-05-14
Code
10
AWD-LSTM + dynamic eval
46.4
No
Dynamic Evaluation of Neural Sequence Models
2017-09-21
Code
11
AWD-LSTM + continuous cache pointer
53.8
No
Regularizing and Optimizing LSTM Language Models
2017-08-07
Code
12
AWD-LSTM-DOC x5
54.19
No
Direct Output Connection for a High-Rank Languag...
2018-08-30
Code
13
AWD-FWM Schlag et al. (2020)
54.48
No
Learning Associative Inference Using Fast Weight...
2020-11-16
Code
14
Ensemble of All
55.4
No
Advancing State of the Art in Language Modeling
2023-11-28
Code
15
Mogrifier LSTM
57.3
No
Mogrifier LSTM
2019-09-04
Code
16
AWD-LSTM-DOC + Partial Shuffle
60.16
No
Partially Shuffling the Training Data to Improve...
2019-03-11
Code
17
AWD-LSTM-DOC
60.29
No
Direct Output Connection for a High-Rank Languag...
2018-08-30
Code
18
AWD-LSTM-MoS + Partial Shuffle
62.38
No
Partially Shuffling the Training Data to Improve...
2019-03-11
Code
19
AWD-LSTM-MoS
63.88
No
Breaking the Softmax Bottleneck: A High-Rank RNN...
2017-11-10
Code
20
AWD-LSTM-DRILL
64.9
No
Deep Residual Output Layers for Neural Language ...
2019-05-14
Code
21
AWD-LSTM 3-layer with Fraternal dropout
66.8
No
Fraternal Dropout
2017-10-31
Code
22
AWD-LSTM + ATOI
67.47
No
Alleviating Sequence Information Loss with Data ...
2019-09-18
Code
23
AWD-LSTM
68.6
No
Regularizing and Optimizing LSTM Language Models
2017-08-07
Code
24
Melis et al. (2017) - 1-layer LSTM (tied)
69.3
No
On the State of the Art of Evaluation in Neural ...
2017-07-18
Code
25
Inan et al. (2016) - Variational LSTM (tied) (h=650) + augmented loss
91.5
No
Tying Word Vectors and Word Classifiers: A Loss ...
2016-11-04
Code
26
Inan et al. (2016) - Variational LSTM (tied) (h=650)
92.3
No
Tying Word Vectors and Word Classifiers: A Loss ...
2016-11-04
Code