TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/DataGpt-SQL-7B: An Open-Source Language Model for Text-to-...

DataGpt-SQL-7B: An Open-Source Language Model for Text-to-SQL

Lixia Wu, Peng Li, Junhong Lou, Lei Fu

2024-09-24Text-To-SQLNatural Language QueriesLanguage Modelling
PaperPDF

Abstract

In addressing the pivotal role of translating natural language queries into SQL commands, we propose a suite of compact, fine-tuned models and self-refine mechanisms to democratize data access and analysis for non-expert users, mitigating risks associated with closed-source Large Language Models. Specifically, we constructed a dataset of over 20K sample for Text-to-SQL as well as the preference dateset, to improve the efficiency in the domain of SQL generation. To further ensure code validity, a code corrector was integrated into the model. Our system, DataGpt-sql, achieved 87.2\% accuracy on the spider-dev, respectively, showcasing the effectiveness of our solution in text-to-SQL conversion tasks. Our code, data, and models are available at \url{https://github.com/CainiaoTechAi/datagpt-sql-7b}

Results

TaskDatasetMetricValueModel
Semantic ParsingspiderExact Match Accuracy (Dev)81.6datagpt-sql-7B + InvalidSQL-Feedback
Semantic ParsingspiderExecution Accuracy (Dev)87.2datagpt-sql-7B + InvalidSQL-Feedback
Semantic ParsingspiderExact Match Accuracy (Dev)80.3datagpt-sql-7B
Semantic ParsingspiderExecution Accuracy (Dev)84.8datagpt-sql-7B
Text-To-SQLspiderExact Match Accuracy (Dev)81.6datagpt-sql-7B + InvalidSQL-Feedback
Text-To-SQLspiderExecution Accuracy (Dev)87.2datagpt-sql-7B + InvalidSQL-Feedback
Text-To-SQLspiderExact Match Accuracy (Dev)80.3datagpt-sql-7B
Text-To-SQLspiderExecution Accuracy (Dev)84.8datagpt-sql-7B

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21Making Language Model a Hierarchical Classifier and Generator2025-07-17VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations2025-07-17Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities2025-07-17Assay2Mol: large language model-based drug design using BioAssay context2025-07-16Describe Anything Model for Visual Question Answering on Text-rich Images2025-07-16InstructFLIP: Exploring Unified Vision-Language Model for Face Anti-spoofing2025-07-16