TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/A Preview of XiYan-SQL: A Multi-Generator Ensemble Framewo...

A Preview of XiYan-SQL: A Multi-Generator Ensemble Framework for Text-to-SQL

Yingqi Gao, Yifu Liu, Xiaoxia Li, Xiaorong Shi, Yin Zhu, Yiming Wang, Shiqi Li, Wei Li, Yuntao Hong, Zhiling Luo, Jinyang Gao, Liyu Mou, Yu Li

2024-11-13Text-To-SQLNamed Entity RecognitionLarge Language Model
PaperPDFCode(official)CodeCodeCodeCode

Abstract

To tackle the challenges of large language model performance in natural language to SQL tasks, we introduce XiYan-SQL, an innovative framework that employs a multi-generator ensemble strategy to improve candidate generation. We introduce M-Schema, a semi-structured schema representation method designed to enhance the understanding of database structures. To enhance the quality and diversity of generated candidate SQL queries, XiYan-SQL integrates the significant potential of in-context learning (ICL) with the precise control of supervised fine-tuning. On one hand, we propose a series of training strategies to fine-tune models to generate high-quality candidates with diverse preferences. On the other hand, we implement the ICL approach with an example selection method based on named entity recognition to prevent overemphasis on entities. The refiner optimizes each candidate by correcting logical or syntactical errors. To address the challenge of identifying the best candidate, we fine-tune a selection model to distinguish nuances of candidate SQL queries. The experimental results on multiple dialect datasets demonstrate the robustness of XiYan-SQL in addressing challenges across different scenarios. Overall, our proposed XiYan-SQL achieves the state-of-the-art execution accuracy of 75.63% on Bird benchmark, 89.65% on the Spider test set, 69.86% on SQL-Eval, 41.20% on NL2GQL. The proposed framework not only enhances the quality and diversity of SQL queries but also outperforms previous methods.

Results

TaskDatasetMetricValueModel
Semantic ParsingspiderExecution Accuracy (Test)89.65XiYan-SQL
Semantic ParsingBIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation)Execution Accuracy % (Dev)73.34XiYan-SQL
Semantic ParsingBIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation)Execution Accuracy % (Test)75.63XiYan-SQL
Semantic ParsingSQL-EvalExecution Accuracy69.86XiYan-SQL
Text-To-SQLspiderExecution Accuracy (Test)89.65XiYan-SQL
Text-To-SQLBIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation)Execution Accuracy % (Dev)73.34XiYan-SQL
Text-To-SQLBIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation)Execution Accuracy % (Test)75.63XiYan-SQL
Text-To-SQLSQL-EvalExecution Accuracy69.86XiYan-SQL

Related Papers

DENSE: Longitudinal Progress Note Generation with Temporal Modeling of Heterogeneous Clinical Notes Across Hospital Visits2025-07-18GeoReg: Weight-Constrained Few-Shot Regression for Socio-Economic Estimation using LLM2025-07-17The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations2025-07-17Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities2025-07-17Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities2025-07-17Assay2Mol: large language model-based drug design using BioAssay context2025-07-16Draw an Ugly Person An Exploration of Generative AIs Perceptions of Ugliness2025-07-16Learning to Tune Like an Expert: Interpretable and Scene-Aware Navigation via MLLM Reasoning and CVAE-Based Adaptation2025-07-15