Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Text-To-SQL
/
BIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation)
Text-To-SQL on BIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation)
Metric: Execution Accuracy % (Test) (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
Sort:
Execution Accuracy % (Test) (best first)
Execution Accuracy % (Test) (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
Execution Accuracy % (Test)
▼
Extra Data
Paper
Date
↕
Code
1
XiYan-SQL
75.63
No
A Preview of XiYan-SQL: A Multi-Generator Ensemb...
2024-11-13
Code
2
DSAIR + GPT-4o
74.12
No
-
-
-
3
CHASE-SQL + Gemini
74.06
No
CHASE-SQL: Multi-Path Reasoning and Preference O...
2024-10-02
-
4
ExSL + granite-34b-code
73.17
No
-
-
-
5
OpenSearch-SQL+ v2 + GPT-4o
72.28
No
-
-
-
6
Distillery + GPT-4o
71.83
No
The Death of Schema Linking? Text-to-SQL in the ...
2024-08-14
-
7
Insights AI
70.26
No
-
-
-
8
PURPLE + RED + GPT-4o
70.21
No
-
-
-
9
MCTS-SQL
69.4
No
-
-
-
10
RECAP + Gemini
69.03
No
-
-
-
11
ByteBrain
68.87
No
-
-
-
12
ExSL + granite-20b-code
67.86
No
-
-
-
13
CHESS
66.69
No
CHESS: Contextual Harnessing for Efficient SQL S...
2024-05-27
Code
14
Arcwise + GPT-4o
66.21
No
-
-
-
15
MCS-SQL + GPT-4
65.45
No
-
-
-
16
SCL-SQL
65.23
No
-
-
-
17
OpenSearch-SQL v1 + GPT-4
64.95
No
-
-
-
18
PB-SQL v1
64.84
No
-
-
-
19
PURPLE + GPT-4o
64.51
No
-
-
-
20
MSL-SQL + DeepSeek-V2.5
64
No
-
-
-
21
SENSE-13B
63.39
No
-
-
-
22
SENSE
63.39
No
-
-
-
23
GRA-SQL
63.22
No
-
-
-
24
SuperSQL
62.66
No
-
-
-
25
Dubo-SQL, v1
60.71
No
-
-
-
26
SFT CodeS-15B
60.37
No
-
-
-
27
MAC-SQL + GPT-4
59.59
No
MAC-SQL: A Multi-Agent Collaborative Framework f...
2023-12-18
Code
28
SFT CodeS-7B
59.25
No
-
-
-
29
DAIL-SQL + GPT-4
57.41
No
Text-to-SQL Empowered by Large Language Models: ...
2023-08-29
Code
30
DIN-SQL + GPT-4
55.9
No
DIN-SQL: Decomposed In-Context Learning of Text-...
2023-04-21
Code
31
GPT-4 (Baseline)
54.89
No
Can LLMs Effectively Leverage Graph Structural I...
2023-09-28
Code
32
Claude-2 (Baseline)
49.02
No
Can LLMs Effectively Leverage Graph Structural I...
2023-09-28
Code
33
Open SQL-7B
47.74
No
-
-
-
34
CoT + ChatGPT
40.08
No
Can LLM Already Serve as A Database Interface? A...
2023-05-04
Code
35
ChatGPT (Baseline)
39.3
No
Can LLM Already Serve as A Database Interface? A...
2023-05-04
Code
36
Codex (Baseline)
36.47
No
Can LLM Already Serve as A Database Interface? A...
2023-05-04
Code
37
Palm-2 (Baseline)
33.04
No
Can LLM Already Serve as A Database Interface? A...
2023-05-04
Code
#1
XiYan-SQL
SOTA
75.63
Execution Accuracy % (Test)
· 2024-11-13
A Preview of XiYan-SQL: A Multi-Generator Ensemble Framework for Text-to-SQL
Code
#2
DSAIR + GPT-4o
74.12
Execution Accuracy % (Test)
No paper
#3
CHASE-SQL + Gemini
SOTA
74.06
Execution Accuracy % (Test)
· 2024-10-02
CHASE-SQL: Multi-Path Reasoning and Preference Optimized Candidate Selection in Text-to-SQL
#4
ExSL + granite-34b-code
73.17
Execution Accuracy % (Test)
No paper
#5
OpenSearch-SQL+ v2 + GPT-4o
72.28
Execution Accuracy % (Test)
No paper
#6
Distillery + GPT-4o
SOTA
71.83
Execution Accuracy % (Test)
· 2024-08-14
The Death of Schema Linking? Text-to-SQL in the Age of Well-Reasoned Language Models
#7
Insights AI
70.26
Execution Accuracy % (Test)
No paper
#8
PURPLE + RED + GPT-4o
70.21
Execution Accuracy % (Test)
No paper
#9
MCTS-SQL
69.4
Execution Accuracy % (Test)
No paper
#10
RECAP + Gemini
69.03
Execution Accuracy % (Test)
No paper
#11
ByteBrain
68.87
Execution Accuracy % (Test)
No paper
#12
ExSL + granite-20b-code
67.86
Execution Accuracy % (Test)
No paper
#13
CHESS
SOTA
66.69
Execution Accuracy % (Test)
· 2024-05-27
CHESS: Contextual Harnessing for Efficient SQL Synthesis
Code
#14
Arcwise + GPT-4o
66.21
Execution Accuracy % (Test)
No paper
#15
MCS-SQL + GPT-4
65.45
Execution Accuracy % (Test)
No paper
#16
SCL-SQL
65.23
Execution Accuracy % (Test)
No paper
#17
OpenSearch-SQL v1 + GPT-4
64.95
Execution Accuracy % (Test)
No paper
#18
PB-SQL v1
64.84
Execution Accuracy % (Test)
No paper
#19
PURPLE + GPT-4o
64.51
Execution Accuracy % (Test)
No paper
#20
MSL-SQL + DeepSeek-V2.5
64
Execution Accuracy % (Test)
No paper
#21
SENSE-13B
63.39
Execution Accuracy % (Test)
No paper
#22
SENSE
63.39
Execution Accuracy % (Test)
No paper
#23
GRA-SQL
63.22
Execution Accuracy % (Test)
No paper
#24
SuperSQL
62.66
Execution Accuracy % (Test)
No paper
#25
Dubo-SQL, v1
60.71
Execution Accuracy % (Test)
No paper
#26
SFT CodeS-15B
60.37
Execution Accuracy % (Test)
No paper
#27
MAC-SQL + GPT-4
SOTA
59.59
Execution Accuracy % (Test)
· 2023-12-18
MAC-SQL: A Multi-Agent Collaborative Framework for Text-to-SQL
Code
#28
SFT CodeS-7B
59.25
Execution Accuracy % (Test)
No paper
#29
DAIL-SQL + GPT-4
SOTA
57.41
Execution Accuracy % (Test)
· 2023-08-29
Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation
Code
#30
DIN-SQL + GPT-4
SOTA
55.9
Execution Accuracy % (Test)
· 2023-04-21
DIN-SQL: Decomposed In-Context Learning of Text-to-SQL with Self-Correction
Code
#31
GPT-4 (Baseline)
54.89
Execution Accuracy % (Test)
· 2023-09-28
Can LLMs Effectively Leverage Graph Structural Information through Prompts, and Why?
Code
#32
Claude-2 (Baseline)
49.02
Execution Accuracy % (Test)
· 2023-09-28
Can LLMs Effectively Leverage Graph Structural Information through Prompts, and Why?
Code
#33
Open SQL-7B
47.74
Execution Accuracy % (Test)
No paper
#34
CoT + ChatGPT
40.08
Execution Accuracy % (Test)
· 2023-05-04
Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs
Code
#35
ChatGPT (Baseline)
39.3
Execution Accuracy % (Test)
· 2023-05-04
Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs
Code
#36
Codex (Baseline)
36.47
Execution Accuracy % (Test)
· 2023-05-04
Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs
Code
#37
Palm-2 (Baseline)
33.04
Execution Accuracy % (Test)
· 2023-05-04
Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs
Code