Metric: Perplexity (lower is better)
| # | Model↕ | Perplexity▲ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | Primer | 12.35 | No | Primer: Searching for Efficient Transformers for... | 2021-09-17 | Code |
| 2 | Zeropoint LLM.int8 13B (vector-wise + decomp) | 12.45 | No | LLM.int8(): 8-bit Matrix Multiplication for Tran... | 2022-08-15 | Code |
| 3 | T5++ | 12.69 | No | Primer: Searching for Efficient Transformers for... | 2021-09-17 | Code |
| 4 | Original T5 | 13.25 | No | Primer: Searching for Efficient Transformers for... | 2021-09-17 | Code |
| 5 | LLM.float32 6.7B | 13.3 | No | LLM.int8(): 8-bit Matrix Multiplication for Tran... | 2022-08-15 | Code |
| 6 | LLM.float32 2.7B | 14.43 | No | LLM.int8(): 8-bit Matrix Multiplication for Tran... | 2022-08-15 | Code |
| 7 | N-Grammer 343M | 14.79 | No | N-Grammer: Augmenting Transformers with latent n... | 2022-07-13 | Code |
| 8 | N-Grammer 288M | 15.01 | No | N-Grammer: Augmenting Transformers with latent n... | 2022-07-13 | Code |
| 9 | LLM.float32 1.3B | 15.91 | No | LLM.int8(): 8-bit Matrix Multiplication for Tran... | 2022-08-15 | Code |