Metric: Accuracy (higher is better)
| # | Model↕ | Accuracy▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | PaLM 2 (few-shot, CoT, SC) | 90.4 | No | PaLM 2 Technical Report | 2023-05-17 | Code |
| 2 | Rethinking with retrieval (GPT-3) | 77.73 | No | Rethinking with Retrieval: Faithful Large Langua... | 2022-12-31 | Code |
| 3 | Self-Evaluation Guided Decoding (Codex, CoT, single reasoning chain, 6-shot gen, 4-shot eval) | 77.2 | No | - | - | - |
| 4 | U-PaLM 540B | 76.6 | No | Transcending Scaling Laws with 0.1% Extra Compute | 2022-10-20 | - |
| 5 | PaLM 540B | 76.4 | No | Transcending Scaling Laws with 0.1% Extra Compute | 2022-10-20 | - |
| 6 | Minerva 540B | 61.9 | No | Transcending Scaling Laws with 0.1% Extra Compute | 2022-10-20 | - |