TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Key-Point-Driven Data Synthesis with its Enhancement on Ma...

Key-Point-Driven Data Synthesis with its Enhancement on Mathematical Reasoning

Yiming Huang, Xiao Liu, Yeyun Gong, Zhibin Gou, Yelong Shen, Nan Duan, Weizhu Chen

2024-03-04Mathematical ReasoningMathMath Word Problem SolvingGSM8K
PaperPDF

Abstract

Large language models (LLMs) have shown great potential in complex reasoning tasks, yet their performance is often hampered by the scarcity of high-quality and reasoning-focused training datasets. Addressing this challenge, we propose Key-Point-Driven Data Synthesis (KPDDS), a novel data synthesis framework that synthesizes question-answer pairs by leveraging key points and exemplar practices from authentic data sources. KPDDS ensures the generation of novel questions with rigorous quality control and substantial scalability. As a result, we present KPMath, an extensive synthetic dataset tailored for mathematical reasoning, comprising over 800K question-answer pairs. Utilizing KPMath and augmenting it with additional reasoning-intensive corpora, we create the comprehensive KPMath-Plus dataset. The Qwen1.5-72B model, fine-tuned on KPMath-Plus, achieves 87.0% PASS@1 accuracy on GSM8K and 58.3% on MATH, surpassing competitors in the 7B to 70B range and best commercial models like GPT-4 across multiple math reasoning datasets.

Results

TaskDatasetMetricValueModel
Question AnsweringMATHAccuracy48.8DeepSeekMath-7B-KPMath-Plus
Question AnsweringMATHParameters (Billions)7DeepSeekMath-7B-KPMath-Plus
Question AnsweringMATHAccuracy48.6Llemma-34B-KPMath-Plus
Question AnsweringMATHParameters (Billions)34Llemma-34B-KPMath-Plus
Question AnsweringMATHAccuracy46.8Mistral-7B-KPMath-Plus
Question AnsweringMATHParameters (Billions)7Mistral-7B-KPMath-Plus
Question AnsweringMATHAccuracy41Llama2-13B-KPMath-Plus
Question AnsweringMATHParameters (Billions)13Llama2-13B-KPMath-Plus
Math Word Problem SolvingMATHAccuracy48.8DeepSeekMath-7B-KPMath-Plus
Math Word Problem SolvingMATHParameters (Billions)7DeepSeekMath-7B-KPMath-Plus
Math Word Problem SolvingMATHAccuracy48.6Llemma-34B-KPMath-Plus
Math Word Problem SolvingMATHParameters (Billions)34Llemma-34B-KPMath-Plus
Math Word Problem SolvingMATHAccuracy46.8Mistral-7B-KPMath-Plus
Math Word Problem SolvingMATHParameters (Billions)7Mistral-7B-KPMath-Plus
Math Word Problem SolvingMATHAccuracy41Llama2-13B-KPMath-Plus
Math Word Problem SolvingMATHParameters (Billions)13Llama2-13B-KPMath-Plus
Mathematical Question AnsweringMATHAccuracy48.8DeepSeekMath-7B-KPMath-Plus
Mathematical Question AnsweringMATHParameters (Billions)7DeepSeekMath-7B-KPMath-Plus
Mathematical Question AnsweringMATHAccuracy48.6Llemma-34B-KPMath-Plus
Mathematical Question AnsweringMATHParameters (Billions)34Llemma-34B-KPMath-Plus
Mathematical Question AnsweringMATHAccuracy46.8Mistral-7B-KPMath-Plus
Mathematical Question AnsweringMATHParameters (Billions)7Mistral-7B-KPMath-Plus
Mathematical Question AnsweringMATHAccuracy41Llama2-13B-KPMath-Plus
Mathematical Question AnsweringMATHParameters (Billions)13Llama2-13B-KPMath-Plus
Mathematical ReasoningMATHAccuracy48.8DeepSeekMath-7B-KPMath-Plus
Mathematical ReasoningMATHParameters (Billions)7DeepSeekMath-7B-KPMath-Plus
Mathematical ReasoningMATHAccuracy48.6Llemma-34B-KPMath-Plus
Mathematical ReasoningMATHParameters (Billions)34Llemma-34B-KPMath-Plus
Mathematical ReasoningMATHAccuracy46.8Mistral-7B-KPMath-Plus
Mathematical ReasoningMATHParameters (Billions)7Mistral-7B-KPMath-Plus
Mathematical ReasoningMATHAccuracy41Llama2-13B-KPMath-Plus
Mathematical ReasoningMATHParameters (Billions)13Llama2-13B-KPMath-Plus

Related Papers

VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks2025-07-17QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation2025-07-17GEMMAS: Graph-based Evaluation Metrics for Multi Agent Systems2025-07-17A Survey of Deep Learning for Geometry Problem Solving2025-07-16Scaling Up RL: Unlocking Diverse Reasoning in LLMs via Prolonged Training2025-07-16DAC: A Dynamic Attention-aware Approach for Task-Agnostic Prompt Compression2025-07-16KisMATH: Do LLMs Have Knowledge of Implicit Structures in Mathematical Reasoning?2025-07-15Temperature and Persona Shape LLM Agent Consensus With Minimal Accuracy Gains in Qualitative Coding2025-07-15