Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Text-To-SQL
/
BIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation)
Text-To-SQL on BIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation)
Metric: Execution Accuracy % (Dev) (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
Sort:
Execution Accuracy % (Dev) (best first)
Execution Accuracy % (Dev) (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
Execution Accuracy % (Dev)
▼
Extra Data
Paper
Date
↕
Code
1
DSAIR + GPT-4o
74.32
No
-
-
-
2
XiYan-SQL
73.34
No
A Preview of XiYan-SQL: A Multi-Generator Ensemb...
2024-11-13
Code
3
CHASE-SQL + Gemini
73.14
No
CHASE-SQL: Multi-Path Reasoning and Preference O...
2024-10-02
-
4
ExSL + granite-34b-code
72.43
No
-
-
-
5
Insights AI
72.16
No
-
-
-
6
OpenSearch-SQL+ v2 + GPT-4o
69.3
No
-
-
-
7
MCTS-SQL
68.91
No
-
-
-
8
PURPLE + RED + GPT-4o
68.12
No
-
-
-
9
Arcwise + GPT-4o
67.99
No
-
-
-
10
Distillery + GPT-4o
67.21
No
The Death of Schema Linking? Text-to-SQL in the ...
2024-08-14
-
11
RECAP + Gemini
66.95
No
-
-
-
12
MSL-SQL + DeepSeek-V2.5
66.82
No
-
-
-
13
MSc-SQL
65.6
No
MSc-SQL: Multi-Sample Critiquing Small Language ...
2024-10-16
Code
14
ByteBrain
65.45
No
-
-
-
15
ExSL + granite-20b-code
65.38
No
-
-
-
16
CHESS
65
No
CHESS: Contextual Harnessing for Efficient SQL S...
2024-05-27
Code
17
SCL-SQL
64.73
No
-
-
-
18
SFT CodeS-15B + SQLFixAgent
64.62
No
-
-
-
19
MCS-SQL + GPT-4
63.36
No
-
-
-
20
PURPLE + GPT-4o
62.97
No
-
-
-
21
GRA-SQL
62.58
No
-
-
-
22
OpenSearch-SQL v1 + GPT-4
61.34
No
-
-
-
23
PB-SQL v1
60.5
No
-
-
-
24
Dubo-SQL, v1
59.71
No
-
-
-
25
SuperSQL
58.5
No
-
-
-
26
SFT CodeS-15B
58.47
No
-
-
-
27
MAC-SQL + GPT-4
57.56
No
MAC-SQL: A Multi-Agent Collaborative Framework f...
2023-12-18
Code
28
SFT CodeS-7B
57.17
No
-
-
-
29
SENSE-13B
55.48
No
-
-
-
30
SENSE
55.48
No
-
-
-
31
DAIL-SQL + GPT-4
54.76
No
Text-to-SQL Empowered by Large Language Models: ...
2023-08-29
Code
32
DIN-SQL + GPT-4
50.72
No
DIN-SQL: Decomposed In-Context Learning of Text-...
2023-04-21
Code
33
DELLM + MAC-SQL
48.92
No
Knowledge-to-SQL: Enhancing SQL Generation with ...
2024-02-18
Code
34
GPT-4 (Baseline)
46.35
No
Can LLMs Effectively Leverage Graph Structural I...
2023-09-28
Code
35
Claude-2 (Baseline)
42.7
No
Can LLMs Effectively Leverage Graph Structural I...
2023-09-28
Code
36
Open SQL-7B
37.68
No
-
-
-
37
ChatGPT (Baseline)
37.22
No
Can LLM Already Serve as A Database Interface? A...
2023-05-04
Code
38
CoT + ChatGPT
36.64
No
Can LLM Already Serve as A Database Interface? A...
2023-05-04
Code
39
Codex (Baseline)
34.35
No
Can LLM Already Serve as A Database Interface? A...
2023-05-04
Code
40
Palm-2 (Baseline)
27.38
No
Can LLM Already Serve as A Database Interface? A...
2023-05-04
Code
#1
DSAIR + GPT-4o
74.32
Execution Accuracy % (Dev)
No paper
#2
XiYan-SQL
SOTA
73.34
Execution Accuracy % (Dev)
· 2024-11-13
A Preview of XiYan-SQL: A Multi-Generator Ensemble Framework for Text-to-SQL
Code
#3
CHASE-SQL + Gemini
SOTA
73.14
Execution Accuracy % (Dev)
· 2024-10-02
CHASE-SQL: Multi-Path Reasoning and Preference Optimized Candidate Selection in Text-to-SQL
#4
ExSL + granite-34b-code
72.43
Execution Accuracy % (Dev)
No paper
#5
Insights AI
72.16
Execution Accuracy % (Dev)
No paper
#6
OpenSearch-SQL+ v2 + GPT-4o
69.3
Execution Accuracy % (Dev)
No paper
#7
MCTS-SQL
68.91
Execution Accuracy % (Dev)
No paper
#8
PURPLE + RED + GPT-4o
68.12
Execution Accuracy % (Dev)
No paper
#9
Arcwise + GPT-4o
67.99
Execution Accuracy % (Dev)
No paper
#10
Distillery + GPT-4o
SOTA
67.21
Execution Accuracy % (Dev)
· 2024-08-14
The Death of Schema Linking? Text-to-SQL in the Age of Well-Reasoned Language Models
#11
RECAP + Gemini
66.95
Execution Accuracy % (Dev)
No paper
#12
MSL-SQL + DeepSeek-V2.5
66.82
Execution Accuracy % (Dev)
No paper
#13
MSc-SQL
65.6
Execution Accuracy % (Dev)
· 2024-10-16
MSc-SQL: Multi-Sample Critiquing Small Language Models For Text-To-SQL Translation
Code
#14
ByteBrain
65.45
Execution Accuracy % (Dev)
No paper
#15
ExSL + granite-20b-code
65.38
Execution Accuracy % (Dev)
No paper
#16
CHESS
SOTA
65
Execution Accuracy % (Dev)
· 2024-05-27
CHESS: Contextual Harnessing for Efficient SQL Synthesis
Code
#17
SCL-SQL
64.73
Execution Accuracy % (Dev)
No paper
#18
SFT CodeS-15B + SQLFixAgent
64.62
Execution Accuracy % (Dev)
No paper
#19
MCS-SQL + GPT-4
63.36
Execution Accuracy % (Dev)
No paper
#20
PURPLE + GPT-4o
62.97
Execution Accuracy % (Dev)
No paper
#21
GRA-SQL
62.58
Execution Accuracy % (Dev)
No paper
#22
OpenSearch-SQL v1 + GPT-4
61.34
Execution Accuracy % (Dev)
No paper
#23
PB-SQL v1
60.5
Execution Accuracy % (Dev)
No paper
#24
Dubo-SQL, v1
59.71
Execution Accuracy % (Dev)
No paper
#25
SuperSQL
58.5
Execution Accuracy % (Dev)
No paper
#26
SFT CodeS-15B
58.47
Execution Accuracy % (Dev)
No paper
#27
MAC-SQL + GPT-4
SOTA
57.56
Execution Accuracy % (Dev)
· 2023-12-18
MAC-SQL: A Multi-Agent Collaborative Framework for Text-to-SQL
Code
#28
SFT CodeS-7B
57.17
Execution Accuracy % (Dev)
No paper
#29
SENSE-13B
55.48
Execution Accuracy % (Dev)
No paper
#30
SENSE
55.48
Execution Accuracy % (Dev)
No paper
#31
DAIL-SQL + GPT-4
SOTA
54.76
Execution Accuracy % (Dev)
· 2023-08-29
Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation
Code
#32
DIN-SQL + GPT-4
SOTA
50.72
Execution Accuracy % (Dev)
· 2023-04-21
DIN-SQL: Decomposed In-Context Learning of Text-to-SQL with Self-Correction
Code
#33
DELLM + MAC-SQL
48.92
Execution Accuracy % (Dev)
· 2024-02-18
Knowledge-to-SQL: Enhancing SQL Generation with Data Expert LLM
Code
#34
GPT-4 (Baseline)
46.35
Execution Accuracy % (Dev)
· 2023-09-28
Can LLMs Effectively Leverage Graph Structural Information through Prompts, and Why?
Code
#35
Claude-2 (Baseline)
42.7
Execution Accuracy % (Dev)
· 2023-09-28
Can LLMs Effectively Leverage Graph Structural Information through Prompts, and Why?
Code
#36
Open SQL-7B
37.68
Execution Accuracy % (Dev)
No paper
#37
ChatGPT (Baseline)
37.22
Execution Accuracy % (Dev)
· 2023-05-04
Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs
Code
#38
CoT + ChatGPT
36.64
Execution Accuracy % (Dev)
· 2023-05-04
Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs
Code
#39
Codex (Baseline)
34.35
Execution Accuracy % (Dev)
· 2023-05-04
Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs
Code
#40
Palm-2 (Baseline)
27.38
Execution Accuracy % (Dev)
· 2023-05-04
Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs
Code