Metric: Average (%) (higher is better)
| # | Model↕ | Average (%)▼ | Augmentations | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | Qwen2.5-72B | 86.3 | No | - | - | - |
| 2 | Jiutian-大模型 | 86.1 | No | - | - | - |
| 3 | LLama-3-405B | 85.9 | No | - | - | - |
| 4 | Jiutian-57B | 84.07 | No | - | - | - |
| 5 | Qwen2-72B | 82.4 | No | - | - | - |
| 6 | LLama-3-70B | 81 | No | - | - | - |
| 7 | Flan-PaLM 540B (3-shot, fine-tuned, CoT + SC) | 78.4 | No | Scaling Instruction-Finetuned Language Models | 2022-10-20 | Code |
| 8 | PaLM 540B (CoT + self-consistency) | 78.2 | No | Scaling Instruction-Finetuned Language Models | 2022-10-20 | Code |
| 9 | code-davinci-002 175B (CoT) | 73.5 | No | Evaluating Large Language Models Trained on Code | 2021-07-07 | Code |
| 10 | Flan-PaLM 540B (3-shot, fine-tuned, CoT) | 72.4 | No | Scaling Instruction-Finetuned Language Models | 2022-10-20 | Code |
| 11 | PaLM 540B (CoT) | 71.2 | No | Scaling Instruction-Finetuned Language Models | 2022-10-20 | Code |
| 12 | Flan-PaLM 540B (5-shot, finetuned) | 70 | No | Scaling Instruction-Finetuned Language Models | 2022-10-20 | Code |
| 13 | PaLM 540B | 62.7 | No | Scaling Instruction-Finetuned Language Models | 2022-10-20 | Code |
| 14 | Orca 2-13B | 50.18 | No | Orca 2: Teaching Small Language Models How to Re... | 2023-11-18 | - |
| 15 | Orca 2-7B | 45.93 | No | Orca 2: Teaching Small Language Models How to Re... | 2023-11-18 | - |