Language Modelling on WikiText-2

Metric: Validation perplexity (lower is better)

LeaderboardDataset

Loading chart...

Results

Hide extra data

Sort:

#	Model↕	Validation perplexity▲	Extra Data	Paper	Date↕	Code
1	GPT-2 (fine-tuned)	15.69	Yes	Hydra: A System for Large Multi-Model Deep Learn...	2021-10-16	Code
2	BERT-Large-CAS	37.7	Yes	Language Models with Transformers	2019-04-20	Code
3	Mogrifier LSTM + dynamic eval	40.2	No	Mogrifier LSTM	2019-09-04	Code
4	adversarial + AWD-LSTM-MoS + dynamic eval	40.27	No	Improving Neural Language Modeling via Adversari...	2019-06-10	Code
5	FRAGE + AWD-LSTM-MoS + dynamic eval	40.85	No	FRAGE: Frequency-Agnostic Word Representation	2018-09-18	Code
6	Past Decode Reg. + AWD-LSTM-MoS + dyn. eval.	42	No	Improved Language Modeling by Decoding the Past	2018-08-14	-
7	GL-LWGC + AWD-MoS-LSTM + dynamic eval	42.19	No	Gradual Learning of Recurrent Neural Networks	2017-08-29	Code
8	AWD-LSTM-MoS + dynamic eval	42.41	No	Breaking the Softmax Bottleneck: A High-Rank RNN...	2017-11-10	Code
9	AWD-LSTM-DRILL + dynamic eval	43.9	No	Deep Residual Output Layers for Neural Language ...	2019-05-14	Code
10	AWD-LSTM + dynamic eval	46.4	No	Dynamic Evaluation of Neural Sequence Models	2017-09-21	Code
11	AWD-LSTM + continuous cache pointer	53.8	No	Regularizing and Optimizing LSTM Language Models	2017-08-07	Code
12	AWD-LSTM-DOC x5	54.19	No	Direct Output Connection for a High-Rank Languag...	2018-08-30	Code
13	AWD-FWM Schlag et al. (2020)	54.48	No	Learning Associative Inference Using Fast Weight...	2020-11-16	Code
14	Ensemble of All	55.4	No	Advancing State of the Art in Language Modeling	2023-11-28	Code
15	Mogrifier LSTM	57.3	No	Mogrifier LSTM	2019-09-04	Code
16	AWD-LSTM-DOC + Partial Shuffle	60.16	No	Partially Shuffling the Training Data to Improve...	2019-03-11	Code
17	AWD-LSTM-DOC	60.29	No	Direct Output Connection for a High-Rank Languag...	2018-08-30	Code
18	AWD-LSTM-MoS + Partial Shuffle	62.38	No	Partially Shuffling the Training Data to Improve...	2019-03-11	Code
19	AWD-LSTM-MoS	63.88	No	Breaking the Softmax Bottleneck: A High-Rank RNN...	2017-11-10	Code
20	AWD-LSTM-DRILL	64.9	No	Deep Residual Output Layers for Neural Language ...	2019-05-14	Code
21	AWD-LSTM 3-layer with Fraternal dropout	66.8	No	Fraternal Dropout	2017-10-31	Code
22	AWD-LSTM + ATOI	67.47	No	Alleviating Sequence Information Loss with Data ...	2019-09-18	Code
23	AWD-LSTM	68.6	No	Regularizing and Optimizing LSTM Language Models	2017-08-07	Code
24	Melis et al. (2017) - 1-layer LSTM (tied)	69.3	No	On the State of the Art of Evaluation in Neural ...	2017-07-18	Code
25	Inan et al. (2016) - Variational LSTM (tied) (h=650) + augmented loss	91.5	No	Tying Word Vectors and Word Classifiers: A Loss ...	2016-11-04	Code
26	Inan et al. (2016) - Variational LSTM (tied) (h=650)	92.3	No	Tying Word Vectors and Word Classifiers: A Loss ...	2016-11-04	Code

#1GPT-2 (fine-tuned)SOTA
15.69
Validation perplexity· Extra Data· 2021-10-16
Hydra: A System for Large Multi-Model Deep Learning Code
#2BERT-Large-CASSOTA
37.7
Validation perplexity· Extra Data· 2019-04-20
Language Models with Transformers Code
#3Mogrifier LSTM + dynamic eval
40.2
Validation perplexity· 2019-09-04
Mogrifier LSTM Code
#4adversarial + AWD-LSTM-MoS + dynamic eval
40.27
Validation perplexity· 2019-06-10
Improving Neural Language Modeling via Adversarial Training Code
#5FRAGE + AWD-LSTM-MoS + dynamic evalSOTA
40.85
Validation perplexity· 2018-09-18
FRAGE: Frequency-Agnostic Word Representation Code
#6Past Decode Reg. + AWD-LSTM-MoS + dyn. eval.SOTA
42
Validation perplexity· 2018-08-14
Improved Language Modeling by Decoding the Past
#7GL-LWGC + AWD-MoS-LSTM + dynamic evalSOTA
42.19
Validation perplexity· 2017-08-29
Gradual Learning of Recurrent Neural Networks Code
#8AWD-LSTM-MoS + dynamic eval
42.41
Validation perplexity· 2017-11-10
Breaking the Softmax Bottleneck: A High-Rank RNN Language Model Code
#9AWD-LSTM-DRILL + dynamic eval
43.9
Validation perplexity· 2019-05-14
Deep Residual Output Layers for Neural Language Generation Code
#10AWD-LSTM + dynamic eval
46.4
Validation perplexity· 2017-09-21
Dynamic Evaluation of Neural Sequence Models Code
#11AWD-LSTM + continuous cache pointerSOTA
53.8
Validation perplexity· 2017-08-07
Regularizing and Optimizing LSTM Language Models Code
#12AWD-LSTM-DOC x5
54.19
Validation perplexity· 2018-08-30
Direct Output Connection for a High-Rank Language Model Code
#13AWD-FWM Schlag et al. (2020)
54.48
Validation perplexity· 2020-11-16
Learning Associative Inference Using Fast Weight Memory Code
#14Ensemble of All
55.4
Validation perplexity· 2023-11-28
Advancing State of the Art in Language Modeling Code
#15Mogrifier LSTM
57.3
Validation perplexity· 2019-09-04
Mogrifier LSTM Code
#16AWD-LSTM-DOC + Partial Shuffle
60.16
Validation perplexity· 2019-03-11
Partially Shuffling the Training Data to Improve Language Models Code
#17AWD-LSTM-DOC
60.29
Validation perplexity· 2018-08-30
Direct Output Connection for a High-Rank Language Model Code
#18AWD-LSTM-MoS + Partial Shuffle
62.38
Validation perplexity· 2019-03-11
Partially Shuffling the Training Data to Improve Language Models Code
#19AWD-LSTM-MoS
63.88
Validation perplexity· 2017-11-10
Breaking the Softmax Bottleneck: A High-Rank RNN Language Model Code
#20AWD-LSTM-DRILL
64.9
Validation perplexity· 2019-05-14
Deep Residual Output Layers for Neural Language Generation Code
#21AWD-LSTM 3-layer with Fraternal dropout
66.8
Validation perplexity· 2017-10-31
Fraternal Dropout Code
#22AWD-LSTM + ATOI
67.47
Validation perplexity· 2019-09-18
Alleviating Sequence Information Loss with Data Overlapping and Prime Batch Sizes Code
#23AWD-LSTMSOTA
68.6
Validation perplexity· 2017-08-07
Regularizing and Optimizing LSTM Language Models Code
#24Melis et al. (2017) - 1-layer LSTM (tied)SOTA
69.3
Validation perplexity· 2017-07-18
On the State of the Art of Evaluation in Neural Language Models Code
#25Inan et al. (2016) - Variational LSTM (tied) (h=650) + augmented lossSOTA
91.5
Validation perplexity· 2016-11-04
Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling Code
#26Inan et al. (2016) - Variational LSTM (tied) (h=650)SOTA
92.3
Validation perplexity· 2016-11-04
Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling Code