Metric: Execution Accuracy % (Dev) (higher is better)
| # | Model↕ | Execution Accuracy % (Dev)▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | DSAIR + GPT-4o | 74.32 | No | - | - | - |
| 2 | XiYan-SQL | 73.34 | No | A Preview of XiYan-SQL: A Multi-Generator Ensemb... | 2024-11-13 | Code |
| 3 | CHASE-SQL + Gemini | 73.14 | No | CHASE-SQL: Multi-Path Reasoning and Preference O... | 2024-10-02 | - |
| 4 | ExSL + granite-34b-code | 72.43 | No | - | - | - |
| 5 | Insights AI | 72.16 | No | - | - | - |
| 6 | OpenSearch-SQL+ v2 + GPT-4o | 69.3 | No | - | - | - |
| 7 | MCTS-SQL | 68.91 | No | - | - | - |
| 8 | PURPLE + RED + GPT-4o | 68.12 | No | - | - | - |
| 9 | Arcwise + GPT-4o | 67.99 | No | - | - | - |
| 10 | Distillery + GPT-4o | 67.21 | No | The Death of Schema Linking? Text-to-SQL in the ... | 2024-08-14 | - |
| 11 | RECAP + Gemini | 66.95 | No | - | - | - |
| 12 | MSL-SQL + DeepSeek-V2.5 | 66.82 | No | - | - | - |
| 13 | MSc-SQL | 65.6 | No | MSc-SQL: Multi-Sample Critiquing Small Language ... | 2024-10-16 | Code |
| 14 | ByteBrain | 65.45 | No | - | - | - |
| 15 | ExSL + granite-20b-code | 65.38 | No | - | - | - |
| 16 | CHESS | 65 | No | CHESS: Contextual Harnessing for Efficient SQL S... | 2024-05-27 | Code |
| 17 | SCL-SQL | 64.73 | No | - | - | - |
| 18 | SFT CodeS-15B + SQLFixAgent | 64.62 | No | - | - | - |
| 19 | MCS-SQL + GPT-4 | 63.36 | No | - | - | - |
| 20 | PURPLE + GPT-4o | 62.97 | No | - | - | - |
| 21 | GRA-SQL | 62.58 | No | - | - | - |
| 22 | OpenSearch-SQL v1 + GPT-4 | 61.34 | No | - | - | - |
| 23 | PB-SQL v1 | 60.5 | No | - | - | - |
| 24 | Dubo-SQL, v1 | 59.71 | No | - | - | - |
| 25 | SuperSQL | 58.5 | No | - | - | - |
| 26 | SFT CodeS-15B | 58.47 | No | - | - | - |
| 27 | MAC-SQL + GPT-4 | 57.56 | No | MAC-SQL: A Multi-Agent Collaborative Framework f... | 2023-12-18 | Code |
| 28 | SFT CodeS-7B | 57.17 | No | - | - | - |
| 29 | SENSE-13B | 55.48 | No | - | - | - |
| 30 | SENSE | 55.48 | No | - | - | - |
| 31 | DAIL-SQL + GPT-4 | 54.76 | No | Text-to-SQL Empowered by Large Language Models: ... | 2023-08-29 | Code |
| 32 | DIN-SQL + GPT-4 | 50.72 | No | DIN-SQL: Decomposed In-Context Learning of Text-... | 2023-04-21 | Code |
| 33 | DELLM + MAC-SQL | 48.92 | No | Knowledge-to-SQL: Enhancing SQL Generation with ... | 2024-02-18 | Code |
| 34 | GPT-4 (Baseline) | 46.35 | No | Can LLMs Effectively Leverage Graph Structural I... | 2023-09-28 | Code |
| 35 | Claude-2 (Baseline) | 42.7 | No | Can LLMs Effectively Leverage Graph Structural I... | 2023-09-28 | Code |
| 36 | Open SQL-7B | 37.68 | No | - | - | - |
| 37 | ChatGPT (Baseline) | 37.22 | No | Can LLM Already Serve as A Database Interface? A... | 2023-05-04 | Code |
| 38 | CoT + ChatGPT | 36.64 | No | Can LLM Already Serve as A Database Interface? A... | 2023-05-04 | Code |
| 39 | Codex (Baseline) | 34.35 | No | Can LLM Already Serve as A Database Interface? A... | 2023-05-04 | Code |
| 40 | Palm-2 (Baseline) | 27.38 | No | Can LLM Already Serve as A Database Interface? A... | 2023-05-04 | Code |