Language Modelling on Wiki-40B

Metric: Perplexity (lower is better)

LeaderboardDataset
Loading chart...
#ModelPerplexityExtra DataPaperDateCode
1FLASH-Quad-8k14.998NoTransformer Quality in Linear Time2022-02-21Code
2Combiner-Axial-8k16.49NoCombiner: Full Attention Transformer with Sparse...2021-07-12Code
3Combiner-Fixed-8k16.6NoCombiner: Full Attention Transformer with Sparse...2021-07-12Code