Language Modelling on Wiki-40B
Metric: Perplexity (lower is better)
LeaderboardDataset
Loading chart...
Results
Submit a result| # | Model↕ | Perplexity▲ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | FLASH-Quad-8k | 14.998 | No | Transformer Quality in Linear Time | 2022-02-21 | Code |
| 2 | Combiner-Axial-8k | 16.49 | No | Combiner: Full Attention Transformer with Sparse... | 2021-07-12 | Code |
| 3 | Combiner-Fixed-8k | 16.6 | No | Combiner: Full Attention Transformer with Sparse... | 2021-07-12 | Code |