Metric: Accuracy (higher is better)
| # | Model↕ | Accuracy▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | ReST meets ReAct (PaLM 2-L + Google Search) | 76.1 | No | ReST meets ReAct: Self-Improvement for Multi-Ste... | 2023-12-15 | - |
| 2 | MCR (code-davinci-002) + Google Search | 66.5 | No | Answering Questions by Meta-Reasoning over Multi... | 2023-04-25 | Code |
| 3 | RALM (LLaMA2-13B + Google Search) | 62.7 | No | Making Retrieval-Augmented Language Models Robus... | 2023-10-02 | Code |
| 4 | Self-ask (GPT-3; davinci-002) + Google Search | 60 | No | Measuring and Narrowing the Compositionality Gap... | 2022-10-07 | Code |
| 5 | Self-ask (GPT-3; davinci-002) | 57.6 | No | Measuring and Narrowing the Compositionality Gap... | 2022-10-07 | Code |
| 6 | Chain-of-Thought (GPT-3; davinci-002) | 46.4 | No | Measuring and Narrowing the Compositionality Gap... | 2022-10-07 | Code |
| 7 | FireAct | 44 | No | FireAct: Toward Language Agent Fine-tuning | 2023-10-09 | - |
| 8 | Direct Prompting (GPT-3; davinci-002) | 17.6 | No | Measuring and Narrowing the Compositionality Gap... | 2022-10-07 | Code |
| 9 | Google Search | 0 | No | - | - | Code |