Metric: Accuracy (higher is better)
| # | Model↕ | Accuracy▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | ALBERT (Ensemble) | 91.4 | No | Improving Machine Reading Comprehension with Sin... | 2020-11-06 | - |
| 2 | Megatron-BERT (ensemble) | 90.9 | No | Megatron-LM: Training Multi-Billion Parameter La... | 2019-09-17 | Code |
| 3 | ALBERTxxlarge+DUMA(ensemble) | 89.8 | No | DUMA: Reading Comprehension with Transposition T... | 2020-01-26 | Code |
| 4 | Megatron-BERT | 89.5 | No | Megatron-LM: Training Multi-Billion Parameter La... | 2019-09-17 | Code |
| 5 | DeBERTalarge | 86.8 | No | DeBERTa: Decoding-enhanced BERT with Disentangle... | 2020-06-05 | Code |
| 6 | B10-10-10 | 85.7 | No | Funnel-Transformer: Filtering out Sequential Red... | 2020-06-05 | Code |
| 7 | RoBERTa | 83.2 | No | RoBERTa: A Robustly Optimized BERT Pretraining A... | 2019-07-26 | Code |
| 8 | Orca 2-13B | 82.87 | No | Orca 2: Teaching Small Language Models How to Re... | 2023-11-18 | - |
| 9 | Orca 2-7B | 80.79 | No | Orca 2: Teaching Small Language Models How to Re... | 2023-11-18 | - |
| 10 | HAT (Encoder) | 67.3 | No | Hierarchical Learning for Generation with Long S... | 2021-04-15 | - |