Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Medical
/
Language Modelling
/
One Billion Word
Language Modelling on One Billion Word
Metric: PPL (lower is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
#
Model
↕
PPL
▲
Extra Data
Paper
Date
↕
Code
1
MDLM (AR baseline)
20.09
No
Simple and Effective Masked Diffusion Language M...
2024-06-11
Code
2
OmniNetT (Large)
21.5
No
OmniNet: Omnidirectional Representations from Tr...
2021-03-01
Code
3
OmniNetP (Large)
21.6
No
OmniNet: Omnidirectional Representations from Tr...
2021-03-01
Code
4
Transformer-XL Large
21.8
No
Transformer-XL: Attentive Language Models Beyond...
2019-01-09
Code
5
OmniNetB (Large)
22
No
OmniNet: Omnidirectional Representations from Tr...
2021-03-01
Code
6
MDLM
23
No
Simple and Effective Masked Diffusion Language M...
2024-06-11
Code
7
Adaptive Input Very Large
23.02
No
Adaptive Input Representations for Neural Langua...
2018-09-28
Code
8
Transformer-XL Base
23.5
No
Transformer-XL: Attentive Language Models Beyond...
2019-01-09
Code
9
SRU++ Large
23.5
No
When Attention Meets Fast Recurrence: Training L...
2021-02-24
Code
10
10 LSTM+CNN inputs + SNM10-SKIP (ensemble)
23.7
No
Exploring the Limits of Language Modeling
2016-02-07
Code
11
Adaptive Input Large
23.91
No
Adaptive Input Representations for Neural Langua...
2018-09-28
Code
12
Mesh Tensorflow
24
No
Mesh-TensorFlow: Deep Learning for Supercomputers
2018-11-05
Code
13
Cohere Large
25.06
No
-
-
-
14
SRU++
25.1
No
When Attention Meets Fast Recurrence: Training L...
2021-02-24
Code
15
DynamicConv
26.67
No
Pay Less Attention with Lightweight and Dynamic ...
2019-01-29
Code
16
High-Budget MoE
28
No
Outrageously Large Neural Networks: The Sparsely...
2017-01-23
Code
17
Evolved Transformer Big
28.6
No
The Evolved Transformer
2019-01-30
Code
18
LSTM-8192-1024 + CNN Input
30
No
Exploring the Limits of Language Modeling
2016-02-07
Code
19
LSTM-8192-1024
30.6
No
Exploring the Limits of Language Modeling
2016-02-07
Code
20
GCNN-14 bottleneck
31.9
No
Language Modeling with Gated Convolutional Netwo...
2016-12-23
Code
21
Low-Budget MoE
34.1
No
Outrageously Large Neural Networks: The Sparsely...
2017-01-23
Code
22
BIG G-LSTM-2
36
No
Factorization tricks for LSTM networks
2017-03-31
Code
23
GPT-2
42.16
Yes
-
-
Code
24
RNN-1024 + 9 Gram
51.3
No
One Billion Word Benchmark for Measuring Progres...
2013-12-11
Code
25
Sparse Non-Negative
52.9
No
Skip-gram Language Modeling Using Sparse Non-neg...
2014-12-03
-