TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Medical/Language Modelling/Text8

Language Modelling on Text8

Metric: Bit per Character (BPC) (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Bit per Character (BPC)▼Extra DataPaperDate↕Code
1td-LSTM (Zhang et al., 2016)1.63NoArchitectural Complexity Measures of Recurrent N...2016-02-26-
2td-LSTM-large1.49NoArchitectural Complexity Measures of Recurrent N...2016-02-26-
3BFN1.41NoBayesian Flow Networks2023-08-14Code
4Unregularised mLSTM1.4NoMultiplicative LSTM for sequence modelling2016-09-26Code
5BN LSTM1.36NoRecurrent Batch Normalization2016-03-30Code
6LayerNorm HM-LSTM1.29NoHierarchical Multiscale Recurrent Neural Networks2016-09-06Code
7Large RHN1.27NoRecurrent Highway Networks2016-07-12Code
8Large mLSTM +emb +WN +VD1.27NoMultiplicative LSTM for sequence modelling2016-09-26Code
9Bipartite flows (8 flows)1.23NoDiscrete Flows: Invertible Generative Models of ...2019-05-24Code
10mLSTM + dynamic eval1.19NoDynamic Evaluation of Neural Sequence Models2017-09-21Code
1112-layer Character Transformer Model1.18NoCharacter-Level Language Modeling with Deeper Se...2018-08-09Code
12PAR Transformer 24B1.18NoPay Attention when Required2020-09-09Code
13GAM-RHN-101.157No--Code
1464-layer Character Transformer Model1.13NoCharacter-Level Language Modeling with Deeper Se...2018-08-09Code
1512L Transformer + 8K adaptive span1.11NoAdaptive Attention Span in Transformers2019-05-19Code
16All-attention network - 18 layers1.11NoAugmenting Self-attention with Persistent Memory2019-07-02Code
17BP-Transformer - 12 Layers1.11NoBP-Transformer: Modelling Long-Range Context via...2019-11-11Code
18Transformer-LS (small)1.09NoLong-Short Transformer: Efficient Transformers f...2021-07-05Code
19Transformer-XL - 24 layers1.08NoTransformer-XL: Attentive Language Models Beyond...2019-01-09Code
20All-attention network - 36 layers1.08NoAugmenting Self-attention with Persistent Memory2019-07-02Code
2124L Transformer + 8K adaptive span1.07NoAdaptive Attention Span in Transformers2019-05-19Code
22Transformer-XL + RMS dynamic eval + decay1.038NoDynamic Evaluation of Transformer Language Models2019-04-17Code
23GPT-20.98Yes--Code
24Focus0.98NoFocus Your Attention (with Adaptive IIR Filters)2023-05-24-