Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Medical
/
Language Modelling
/
WikiText-2
Language Modelling on WikiText-2
Metric: Validation perplexity (lower is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
Sort:
Validation perplexity (best first)
Validation perplexity (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
Validation perplexity
▲
Extra Data
Paper
Date
↕
Code
1
GPT-2 (fine-tuned)
15.69
Yes
Hydra: A System for Large Multi-Model Deep Learn...
2021-10-16
Code
2
BERT-Large-CAS
37.7
Yes
Language Models with Transformers
2019-04-20
Code
3
Mogrifier LSTM + dynamic eval
40.2
No
Mogrifier LSTM
2019-09-04
Code
4
adversarial + AWD-LSTM-MoS + dynamic eval
40.27
No
Improving Neural Language Modeling via Adversari...
2019-06-10
Code
5
FRAGE + AWD-LSTM-MoS + dynamic eval
40.85
No
FRAGE: Frequency-Agnostic Word Representation
2018-09-18
Code
6
Past Decode Reg. + AWD-LSTM-MoS + dyn. eval.
42
No
Improved Language Modeling by Decoding the Past
2018-08-14
-
7
GL-LWGC + AWD-MoS-LSTM + dynamic eval
42.19
No
Gradual Learning of Recurrent Neural Networks
2017-08-29
Code
8
AWD-LSTM-MoS + dynamic eval
42.41
No
Breaking the Softmax Bottleneck: A High-Rank RNN...
2017-11-10
Code
9
AWD-LSTM-DRILL + dynamic eval
43.9
No
Deep Residual Output Layers for Neural Language ...
2019-05-14
Code
10
AWD-LSTM + dynamic eval
46.4
No
Dynamic Evaluation of Neural Sequence Models
2017-09-21
Code
11
AWD-LSTM + continuous cache pointer
53.8
No
Regularizing and Optimizing LSTM Language Models
2017-08-07
Code
12
AWD-LSTM-DOC x5
54.19
No
Direct Output Connection for a High-Rank Languag...
2018-08-30
Code
13
AWD-FWM Schlag et al. (2020)
54.48
No
Learning Associative Inference Using Fast Weight...
2020-11-16
Code
14
Ensemble of All
55.4
No
Advancing State of the Art in Language Modeling
2023-11-28
Code
15
Mogrifier LSTM
57.3
No
Mogrifier LSTM
2019-09-04
Code
16
AWD-LSTM-DOC + Partial Shuffle
60.16
No
Partially Shuffling the Training Data to Improve...
2019-03-11
Code
17
AWD-LSTM-DOC
60.29
No
Direct Output Connection for a High-Rank Languag...
2018-08-30
Code
18
AWD-LSTM-MoS + Partial Shuffle
62.38
No
Partially Shuffling the Training Data to Improve...
2019-03-11
Code
19
AWD-LSTM-MoS
63.88
No
Breaking the Softmax Bottleneck: A High-Rank RNN...
2017-11-10
Code
20
AWD-LSTM-DRILL
64.9
No
Deep Residual Output Layers for Neural Language ...
2019-05-14
Code
21
AWD-LSTM 3-layer with Fraternal dropout
66.8
No
Fraternal Dropout
2017-10-31
Code
22
AWD-LSTM + ATOI
67.47
No
Alleviating Sequence Information Loss with Data ...
2019-09-18
Code
23
AWD-LSTM
68.6
No
Regularizing and Optimizing LSTM Language Models
2017-08-07
Code
24
Melis et al. (2017) - 1-layer LSTM (tied)
69.3
No
On the State of the Art of Evaluation in Neural ...
2017-07-18
Code
25
Inan et al. (2016) - Variational LSTM (tied) (h=650) + augmented loss
91.5
No
Tying Word Vectors and Word Classifiers: A Loss ...
2016-11-04
Code
26
Inan et al. (2016) - Variational LSTM (tied) (h=650)
92.3
No
Tying Word Vectors and Word Classifiers: A Loss ...
2016-11-04
Code
#1
GPT-2 (fine-tuned)
SOTA
15.69
Validation perplexity
· Extra Data
· 2021-10-16
Hydra: A System for Large Multi-Model Deep Learning
Code
#2
BERT-Large-CAS
SOTA
37.7
Validation perplexity
· Extra Data
· 2019-04-20
Language Models with Transformers
Code
#3
Mogrifier LSTM + dynamic eval
40.2
Validation perplexity
· 2019-09-04
Mogrifier LSTM
Code
#4
adversarial + AWD-LSTM-MoS + dynamic eval
40.27
Validation perplexity
· 2019-06-10
Improving Neural Language Modeling via Adversarial Training
Code
#5
FRAGE + AWD-LSTM-MoS + dynamic eval
SOTA
40.85
Validation perplexity
· 2018-09-18
FRAGE: Frequency-Agnostic Word Representation
Code
#6
Past Decode Reg. + AWD-LSTM-MoS + dyn. eval.
SOTA
42
Validation perplexity
· 2018-08-14
Improved Language Modeling by Decoding the Past
#7
GL-LWGC + AWD-MoS-LSTM + dynamic eval
SOTA
42.19
Validation perplexity
· 2017-08-29
Gradual Learning of Recurrent Neural Networks
Code
#8
AWD-LSTM-MoS + dynamic eval
42.41
Validation perplexity
· 2017-11-10
Breaking the Softmax Bottleneck: A High-Rank RNN Language Model
Code
#9
AWD-LSTM-DRILL + dynamic eval
43.9
Validation perplexity
· 2019-05-14
Deep Residual Output Layers for Neural Language Generation
Code
#10
AWD-LSTM + dynamic eval
46.4
Validation perplexity
· 2017-09-21
Dynamic Evaluation of Neural Sequence Models
Code
#11
AWD-LSTM + continuous cache pointer
SOTA
53.8
Validation perplexity
· 2017-08-07
Regularizing and Optimizing LSTM Language Models
Code
#12
AWD-LSTM-DOC x5
54.19
Validation perplexity
· 2018-08-30
Direct Output Connection for a High-Rank Language Model
Code
#13
AWD-FWM Schlag et al. (2020)
54.48
Validation perplexity
· 2020-11-16
Learning Associative Inference Using Fast Weight Memory
Code
#14
Ensemble of All
55.4
Validation perplexity
· 2023-11-28
Advancing State of the Art in Language Modeling
Code
#15
Mogrifier LSTM
57.3
Validation perplexity
· 2019-09-04
Mogrifier LSTM
Code
#16
AWD-LSTM-DOC + Partial Shuffle
60.16
Validation perplexity
· 2019-03-11
Partially Shuffling the Training Data to Improve Language Models
Code
#17
AWD-LSTM-DOC
60.29
Validation perplexity
· 2018-08-30
Direct Output Connection for a High-Rank Language Model
Code
#18
AWD-LSTM-MoS + Partial Shuffle
62.38
Validation perplexity
· 2019-03-11
Partially Shuffling the Training Data to Improve Language Models
Code
#19
AWD-LSTM-MoS
63.88
Validation perplexity
· 2017-11-10
Breaking the Softmax Bottleneck: A High-Rank RNN Language Model
Code
#20
AWD-LSTM-DRILL
64.9
Validation perplexity
· 2019-05-14
Deep Residual Output Layers for Neural Language Generation
Code
#21
AWD-LSTM 3-layer with Fraternal dropout
66.8
Validation perplexity
· 2017-10-31
Fraternal Dropout
Code
#22
AWD-LSTM + ATOI
67.47
Validation perplexity
· 2019-09-18
Alleviating Sequence Information Loss with Data Overlapping and Prime Batch Sizes
Code
#23
AWD-LSTM
SOTA
68.6
Validation perplexity
· 2017-08-07
Regularizing and Optimizing LSTM Language Models
Code
#24
Melis et al. (2017) - 1-layer LSTM (tied)
SOTA
69.3
Validation perplexity
· 2017-07-18
On the State of the Art of Evaluation in Neural Language Models
Code
#25
Inan et al. (2016) - Variational LSTM (tied) (h=650) + augmented loss
SOTA
91.5
Validation perplexity
· 2016-11-04
Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling
Code
#26
Inan et al. (2016) - Variational LSTM (tied) (h=650)
SOTA
92.3
Validation perplexity
· 2016-11-04
Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling
Code