TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Medical/Language Modelling/One Billion Word

Language Modelling on One Billion Word

Metric: PPL (lower is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕PPL▲Extra DataPaperDate↕Code
1MDLM (AR baseline)20.09NoSimple and Effective Masked Diffusion Language M...2024-06-11Code
2OmniNetT (Large)21.5NoOmniNet: Omnidirectional Representations from Tr...2021-03-01Code
3OmniNetP (Large)21.6NoOmniNet: Omnidirectional Representations from Tr...2021-03-01Code
4Transformer-XL Large21.8NoTransformer-XL: Attentive Language Models Beyond...2019-01-09Code
5OmniNetB (Large)22NoOmniNet: Omnidirectional Representations from Tr...2021-03-01Code
6MDLM23NoSimple and Effective Masked Diffusion Language M...2024-06-11Code
7Adaptive Input Very Large23.02NoAdaptive Input Representations for Neural Langua...2018-09-28Code
8Transformer-XL Base23.5NoTransformer-XL: Attentive Language Models Beyond...2019-01-09Code
9SRU++ Large23.5NoWhen Attention Meets Fast Recurrence: Training L...2021-02-24Code
1010 LSTM+CNN inputs + SNM10-SKIP (ensemble)23.7NoExploring the Limits of Language Modeling2016-02-07Code
11Adaptive Input Large23.91NoAdaptive Input Representations for Neural Langua...2018-09-28Code
12Mesh Tensorflow24NoMesh-TensorFlow: Deep Learning for Supercomputers2018-11-05Code
13Cohere Large25.06No---
14SRU++25.1NoWhen Attention Meets Fast Recurrence: Training L...2021-02-24Code
15DynamicConv26.67NoPay Less Attention with Lightweight and Dynamic ...2019-01-29Code
16High-Budget MoE28NoOutrageously Large Neural Networks: The Sparsely...2017-01-23Code
17Evolved Transformer Big28.6NoThe Evolved Transformer2019-01-30Code
18LSTM-8192-1024 + CNN Input30NoExploring the Limits of Language Modeling2016-02-07Code
19LSTM-8192-102430.6NoExploring the Limits of Language Modeling2016-02-07Code
20 GCNN-14 bottleneck31.9NoLanguage Modeling with Gated Convolutional Netwo...2016-12-23Code
21Low-Budget MoE34.1NoOutrageously Large Neural Networks: The Sparsely...2017-01-23Code
22BIG G-LSTM-236NoFactorization tricks for LSTM networks2017-03-31Code
23GPT-242.16Yes--Code
24RNN-1024 + 9 Gram51.3NoOne Billion Word Benchmark for Measuring Progres...2013-12-11Code
25Sparse Non-Negative52.9NoSkip-gram Language Modeling Using Sparse Non-neg...2014-12-03-