Metric: Accuracy (higher is better)
| # | Model↕ | Accuracy▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | PSQ (Chen et al., 2020) | 86.8 | No | A Statistical Framework for Low-bitwidth Trainin... | 2020-10-27 | Code |
| 2 | Q8BERT (Zafrir et al., 2019) | 84.8 | No | Q8BERT: Quantized 8Bit BERT | 2019-10-14 | Code |
| 3 | Q-BERT (Shen et al., 2020) | 84.7 | No | Q-BERT: Hessian Based Ultra Low Precision Quanti... | 2019-09-12 | - |
| 4 | KiC-770M | 74 | No | Knowledge-in-Context: Towards Knowledgeable Semi... | 2022-10-28 | - |
| 5 | Flipped-3B | 71.05 | No | Guess the Instruction! Flipped Learning Makes La... | 2022-10-06 | Code |
| 6 | RoE-3B | 64.01 | No | Exploring the Benefits of Training Expert Langua... | 2023-02-07 | Code |
| 7 | ELC-BERT-base 98M (zero init) | 63 | No | Not all layers are equally as important: Every L... | 2023-11-03 | - |
| 8 | ELC-BERT-small 24M | 55.4 | No | Not all layers are equally as important: Every L... | 2023-11-03 | - |
| 9 | LTG-BERT-base 98M | 54.7 | No | Not all layers are equally as important: Every L... | 2023-11-03 | - |
| 10 | LTG-BERT-small 24M | 53.7 | No | Not all layers are equally as important: Every L... | 2023-11-03 | - |