Metric: Execution Accuracy % (Test) (higher is better)
| # | Model↕ | Execution Accuracy % (Test)▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | XiYan-SQL | 75.63 | No | A Preview of XiYan-SQL: A Multi-Generator Ensemb... | 2024-11-13 | Code |
| 2 | DSAIR + GPT-4o | 74.12 | No | - | - | - |
| 3 | CHASE-SQL + Gemini | 74.06 | No | CHASE-SQL: Multi-Path Reasoning and Preference O... | 2024-10-02 | - |
| 4 | ExSL + granite-34b-code | 73.17 | No | - | - | - |
| 5 | OpenSearch-SQL+ v2 + GPT-4o | 72.28 | No | - | - | - |
| 6 | Distillery + GPT-4o | 71.83 | No | The Death of Schema Linking? Text-to-SQL in the ... | 2024-08-14 | - |
| 7 | Insights AI | 70.26 | No | - | - | - |
| 8 | PURPLE + RED + GPT-4o | 70.21 | No | - | - | - |
| 9 | MCTS-SQL | 69.4 | No | - | - | - |
| 10 | RECAP + Gemini | 69.03 | No | - | - | - |
| 11 | ByteBrain | 68.87 | No | - | - | - |
| 12 | ExSL + granite-20b-code | 67.86 | No | - | - | - |
| 13 | CHESS | 66.69 | No | CHESS: Contextual Harnessing for Efficient SQL S... | 2024-05-27 | Code |
| 14 | Arcwise + GPT-4o | 66.21 | No | - | - | - |
| 15 | MCS-SQL + GPT-4 | 65.45 | No | - | - | - |
| 16 | SCL-SQL | 65.23 | No | - | - | - |
| 17 | OpenSearch-SQL v1 + GPT-4 | 64.95 | No | - | - | - |
| 18 | PB-SQL v1 | 64.84 | No | - | - | - |
| 19 | PURPLE + GPT-4o | 64.51 | No | - | - | - |
| 20 | MSL-SQL + DeepSeek-V2.5 | 64 | No | - | - | - |
| 21 | SENSE-13B | 63.39 | No | - | - | - |
| 22 | SENSE | 63.39 | No | - | - | - |
| 23 | GRA-SQL | 63.22 | No | - | - | - |
| 24 | SuperSQL | 62.66 | No | - | - | - |
| 25 | Dubo-SQL, v1 | 60.71 | No | - | - | - |
| 26 | SFT CodeS-15B | 60.37 | No | - | - | - |
| 27 | MAC-SQL + GPT-4 | 59.59 | No | MAC-SQL: A Multi-Agent Collaborative Framework f... | 2023-12-18 | Code |
| 28 | SFT CodeS-7B | 59.25 | No | - | - | - |
| 29 | DAIL-SQL + GPT-4 | 57.41 | No | Text-to-SQL Empowered by Large Language Models: ... | 2023-08-29 | Code |
| 30 | DIN-SQL + GPT-4 | 55.9 | No | DIN-SQL: Decomposed In-Context Learning of Text-... | 2023-04-21 | Code |
| 31 | GPT-4 (Baseline) | 54.89 | No | Can LLMs Effectively Leverage Graph Structural I... | 2023-09-28 | Code |
| 32 | Claude-2 (Baseline) | 49.02 | No | Can LLMs Effectively Leverage Graph Structural I... | 2023-09-28 | Code |
| 33 | Open SQL-7B | 47.74 | No | - | - | - |
| 34 | CoT + ChatGPT | 40.08 | No | Can LLM Already Serve as A Database Interface? A... | 2023-05-04 | Code |
| 35 | ChatGPT (Baseline) | 39.3 | No | Can LLM Already Serve as A Database Interface? A... | 2023-05-04 | Code |
| 36 | Codex (Baseline) | 36.47 | No | Can LLM Already Serve as A Database Interface? A... | 2023-05-04 | Code |
| 37 | Palm-2 (Baseline) | 33.04 | No | Can LLM Already Serve as A Database Interface? A... | 2023-05-04 | Code |