Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Medical
/
Language Modelling
/
Text8
Language Modelling on Text8
Metric: Bit per Character (BPC) (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
#
Model
↕
Bit per Character (BPC)
▼
Extra Data
Paper
Date
↕
Code
1
td-LSTM (Zhang et al., 2016)
1.63
No
Architectural Complexity Measures of Recurrent N...
2016-02-26
-
2
td-LSTM-large
1.49
No
Architectural Complexity Measures of Recurrent N...
2016-02-26
-
3
BFN
1.41
No
Bayesian Flow Networks
2023-08-14
Code
4
Unregularised mLSTM
1.4
No
Multiplicative LSTM for sequence modelling
2016-09-26
Code
5
BN LSTM
1.36
No
Recurrent Batch Normalization
2016-03-30
Code
6
LayerNorm HM-LSTM
1.29
No
Hierarchical Multiscale Recurrent Neural Networks
2016-09-06
Code
7
Large RHN
1.27
No
Recurrent Highway Networks
2016-07-12
Code
8
Large mLSTM +emb +WN +VD
1.27
No
Multiplicative LSTM for sequence modelling
2016-09-26
Code
9
Bipartite flows (8 flows)
1.23
No
Discrete Flows: Invertible Generative Models of ...
2019-05-24
Code
10
mLSTM + dynamic eval
1.19
No
Dynamic Evaluation of Neural Sequence Models
2017-09-21
Code
11
12-layer Character Transformer Model
1.18
No
Character-Level Language Modeling with Deeper Se...
2018-08-09
Code
12
PAR Transformer 24B
1.18
No
Pay Attention when Required
2020-09-09
Code
13
GAM-RHN-10
1.157
No
-
-
Code
14
64-layer Character Transformer Model
1.13
No
Character-Level Language Modeling with Deeper Se...
2018-08-09
Code
15
12L Transformer + 8K adaptive span
1.11
No
Adaptive Attention Span in Transformers
2019-05-19
Code
16
All-attention network - 18 layers
1.11
No
Augmenting Self-attention with Persistent Memory
2019-07-02
Code
17
BP-Transformer - 12 Layers
1.11
No
BP-Transformer: Modelling Long-Range Context via...
2019-11-11
Code
18
Transformer-LS (small)
1.09
No
Long-Short Transformer: Efficient Transformers f...
2021-07-05
Code
19
Transformer-XL - 24 layers
1.08
No
Transformer-XL: Attentive Language Models Beyond...
2019-01-09
Code
20
All-attention network - 36 layers
1.08
No
Augmenting Self-attention with Persistent Memory
2019-07-02
Code
21
24L Transformer + 8K adaptive span
1.07
No
Adaptive Attention Span in Transformers
2019-05-19
Code
22
Transformer-XL + RMS dynamic eval + decay
1.038
No
Dynamic Evaluation of Transformer Language Models
2019-04-17
Code
23
GPT-2
0.98
Yes
-
-
Code
24
Focus
0.98
No
Focus Your Attention (with Adaptive IIR Filters)
2023-05-24
-