Text-To-SQL on BIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation)

Metric: Execution Accuracy % (Test) (higher is better)

LeaderboardDataset

Loading chart...

Results

Sort:

#	Model↕	Execution Accuracy % (Test)▼	Extra Data	Paper	Date↕	Code
1	XiYan-SQL	75.63	No	A Preview of XiYan-SQL: A Multi-Generator Ensemb...	2024-11-13	Code
2	DSAIR + GPT-4o	74.12	No	-	-	-
3	CHASE-SQL + Gemini	74.06	No	CHASE-SQL: Multi-Path Reasoning and Preference O...	2024-10-02	-
4	ExSL + granite-34b-code	73.17	No	-	-	-
5	OpenSearch-SQL+ v2 + GPT-4o	72.28	No	-	-	-
6	Distillery + GPT-4o	71.83	No	The Death of Schema Linking? Text-to-SQL in the ...	2024-08-14	-
7	Insights AI	70.26	No	-	-	-
8	PURPLE + RED + GPT-4o	70.21	No	-	-	-
9	MCTS-SQL	69.4	No	-	-	-
10	RECAP + Gemini	69.03	No	-	-	-
11	ByteBrain	68.87	No	-	-	-
12	ExSL + granite-20b-code	67.86	No	-	-	-
13	CHESS	66.69	No	CHESS: Contextual Harnessing for Efficient SQL S...	2024-05-27	Code
14	Arcwise + GPT-4o	66.21	No	-	-	-
15	MCS-SQL + GPT-4	65.45	No	-	-	-
16	SCL-SQL	65.23	No	-	-	-
17	OpenSearch-SQL v1 + GPT-4	64.95	No	-	-	-
18	PB-SQL v1	64.84	No	-	-	-
19	PURPLE + GPT-4o	64.51	No	-	-	-
20	MSL-SQL + DeepSeek-V2.5	64	No	-	-	-
21	SENSE-13B	63.39	No	-	-	-
22	SENSE	63.39	No	-	-	-
23	GRA-SQL	63.22	No	-	-	-
24	SuperSQL	62.66	No	-	-	-
25	Dubo-SQL, v1	60.71	No	-	-	-
26	SFT CodeS-15B	60.37	No	-	-	-
27	MAC-SQL + GPT-4	59.59	No	MAC-SQL: A Multi-Agent Collaborative Framework f...	2023-12-18	Code
28	SFT CodeS-7B	59.25	No	-	-	-
29	DAIL-SQL + GPT-4	57.41	No	Text-to-SQL Empowered by Large Language Models: ...	2023-08-29	Code
30	DIN-SQL + GPT-4	55.9	No	DIN-SQL: Decomposed In-Context Learning of Text-...	2023-04-21	Code
31	GPT-4 (Baseline)	54.89	No	Can LLMs Effectively Leverage Graph Structural I...	2023-09-28	Code
32	Claude-2 (Baseline)	49.02	No	Can LLMs Effectively Leverage Graph Structural I...	2023-09-28	Code
33	Open SQL-7B	47.74	No	-	-	-
34	CoT + ChatGPT	40.08	No	Can LLM Already Serve as A Database Interface? A...	2023-05-04	Code
35	ChatGPT (Baseline)	39.3	No	Can LLM Already Serve as A Database Interface? A...	2023-05-04	Code
36	Codex (Baseline)	36.47	No	Can LLM Already Serve as A Database Interface? A...	2023-05-04	Code
37	Palm-2 (Baseline)	33.04	No	Can LLM Already Serve as A Database Interface? A...	2023-05-04	Code

#1XiYan-SQLSOTA
75.63
Execution Accuracy % (Test)· 2024-11-13
A Preview of XiYan-SQL: A Multi-Generator Ensemble Framework for Text-to-SQL Code
#2DSAIR + GPT-4o
74.12
Execution Accuracy % (Test)
No paper
#3CHASE-SQL + GeminiSOTA
74.06
Execution Accuracy % (Test)· 2024-10-02
CHASE-SQL: Multi-Path Reasoning and Preference Optimized Candidate Selection in Text-to-SQL
#4ExSL + granite-34b-code
73.17
Execution Accuracy % (Test)
No paper
#5OpenSearch-SQL+ v2 + GPT-4o
72.28
Execution Accuracy % (Test)
No paper
#6Distillery + GPT-4oSOTA
71.83
Execution Accuracy % (Test)· 2024-08-14
The Death of Schema Linking? Text-to-SQL in the Age of Well-Reasoned Language Models
#7Insights AI
70.26
Execution Accuracy % (Test)
No paper
#8PURPLE + RED + GPT-4o
70.21
Execution Accuracy % (Test)
No paper
#9MCTS-SQL
69.4
Execution Accuracy % (Test)
No paper
#10RECAP + Gemini
69.03
Execution Accuracy % (Test)
No paper
#11ByteBrain
68.87
Execution Accuracy % (Test)
No paper
#12ExSL + granite-20b-code
67.86
Execution Accuracy % (Test)
No paper
#13CHESSSOTA
66.69
Execution Accuracy % (Test)· 2024-05-27
CHESS: Contextual Harnessing for Efficient SQL Synthesis Code
#14Arcwise + GPT-4o
66.21
Execution Accuracy % (Test)
No paper
#15MCS-SQL + GPT-4
65.45
Execution Accuracy % (Test)
No paper
#16SCL-SQL
65.23
Execution Accuracy % (Test)
No paper
#17OpenSearch-SQL v1 + GPT-4
64.95
Execution Accuracy % (Test)
No paper
#18PB-SQL v1
64.84
Execution Accuracy % (Test)
No paper
#19PURPLE + GPT-4o
64.51
Execution Accuracy % (Test)
No paper
#20MSL-SQL + DeepSeek-V2.5
64
Execution Accuracy % (Test)
No paper
#21SENSE-13B
63.39
Execution Accuracy % (Test)
No paper
#22SENSE
63.39
Execution Accuracy % (Test)
No paper
#23GRA-SQL
63.22
Execution Accuracy % (Test)
No paper
#24SuperSQL
62.66
Execution Accuracy % (Test)
No paper
#25Dubo-SQL, v1
60.71
Execution Accuracy % (Test)
No paper
#26SFT CodeS-15B
60.37
Execution Accuracy % (Test)
No paper
#27MAC-SQL + GPT-4SOTA
59.59
Execution Accuracy % (Test)· 2023-12-18
MAC-SQL: A Multi-Agent Collaborative Framework for Text-to-SQL Code
#28SFT CodeS-7B
59.25
Execution Accuracy % (Test)
No paper
#29DAIL-SQL + GPT-4SOTA
57.41
Execution Accuracy % (Test)· 2023-08-29
Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation Code
#30DIN-SQL + GPT-4SOTA
55.9
Execution Accuracy % (Test)· 2023-04-21
DIN-SQL: Decomposed In-Context Learning of Text-to-SQL with Self-Correction Code
#31GPT-4 (Baseline)
54.89
Execution Accuracy % (Test)· 2023-09-28
Can LLMs Effectively Leverage Graph Structural Information through Prompts, and Why?Code
#32Claude-2 (Baseline)
49.02
Execution Accuracy % (Test)· 2023-09-28
Can LLMs Effectively Leverage Graph Structural Information through Prompts, and Why?Code
#33Open SQL-7B
47.74
Execution Accuracy % (Test)
No paper
#34CoT + ChatGPT
40.08
Execution Accuracy % (Test)· 2023-05-04
Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs Code
#35ChatGPT (Baseline)
39.3
Execution Accuracy % (Test)· 2023-05-04
Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs Code
#36Codex (Baseline)
36.47
Execution Accuracy % (Test)· 2023-05-04
Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs Code
#37Palm-2 (Baseline)
33.04
Execution Accuracy % (Test)· 2023-05-04
Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs Code