TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/An Empirical Study of Data Ability Boundary in LLMs' Math ...

An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning

Zui Chen, Yezeng Chen, Jiaqi Han, Zhijie Huang, Ji Qi, Yi Zhou

2024-02-23Math Word Problem SolvingAutomated Theorem ProvingArithmetic Reasoning
PaperPDFCode(official)

Abstract

Large language models (LLMs) are displaying emergent abilities for math reasoning tasks,and there is a growing attention on enhancing the ability of open-source LLMs through supervised fine-tuning (SFT).In this paper, we aim to explore a general data strategy for supervised data to help optimize and expand math reasoning ability.Firstly, we determine the ability boundary of reasoning paths augmentation by identifying these paths' minimal optimal set.Secondly, we validate that different abilities of the model can be cumulatively enhanced by Mix of Minimal Optimal Sets of corresponding types of data, while our models MMOS achieve SOTA performance on series base models under much lower construction costs.Besides, we point out GSM-HARD is not really hard and today's LLMs no longer lack numerical robustness.Also, we provide an Auto Problem Generator for robustness testing and educational applications.Our code and data are publicly available at https://github.com/cyzhh/MMOS.

Results

TaskDatasetMetricValueModel
Question AnsweringMATHAccuracy63.7MMOS-DeepSeekMath-7B(0-shot,k=50)
Question AnsweringMATHParameters (Billions)7MMOS-DeepSeekMath-7B(0-shot,k=50)
Question AnsweringMATHAccuracy55MMOS-DeepSeekMath-7B(0-shot)
Question AnsweringMATHParameters (Billions)7MMOS-DeepSeekMath-7B(0-shot)
Question AnsweringMATHAccuracy49.5MMOS-CODE-34B(0-shot)
Question AnsweringMATHParameters (Billions)34MMOS-CODE-34B(0-shot)
Question AnsweringMATHAccuracy44.3MMOS-CODE-7B(0-shot)
Question AnsweringMATHParameters (Billions)7MMOS-CODE-7B(0-shot)
Question AnsweringASDiv-AExecution Accuracy87.6MMOS-DeepSeekMath-7B(0-shot)
Question AnsweringASDiv-AExecution Accuracy85.1MMOS-CODE-34B(0-shot)
Question AnsweringASDiv-AExecution Accuracy78.6MMOS-CODE-7B(0-shot)
Question AnsweringSVAMPExecution Accuracy80.6MMOS-CODE-34B(0-shot)
Question AnsweringSVAMPExecution Accuracy79.3MMOS-DeepSeekMath-7B(0-shot)
Question AnsweringSVAMPExecution Accuracy76.4MMOS-CODE-7B(0-shot)
Automated Theorem ProvingminiF2F-testPass@128.3MMOS-DeepSeekMath-7B
Automated Theorem ProvingminiF2F-testcumulative28.3MMOS-DeepSeekMath-7B
Mathematical ProofsminiF2F-testPass@128.3MMOS-DeepSeekMath-7B
Mathematical ProofsminiF2F-testcumulative28.3MMOS-DeepSeekMath-7B
Math Word Problem SolvingMATHAccuracy63.7MMOS-DeepSeekMath-7B(0-shot,k=50)
Math Word Problem SolvingMATHParameters (Billions)7MMOS-DeepSeekMath-7B(0-shot,k=50)
Math Word Problem SolvingMATHAccuracy55MMOS-DeepSeekMath-7B(0-shot)
Math Word Problem SolvingMATHParameters (Billions)7MMOS-DeepSeekMath-7B(0-shot)
Math Word Problem SolvingMATHAccuracy49.5MMOS-CODE-34B(0-shot)
Math Word Problem SolvingMATHParameters (Billions)34MMOS-CODE-34B(0-shot)
Math Word Problem SolvingMATHAccuracy44.3MMOS-CODE-7B(0-shot)
Math Word Problem SolvingMATHParameters (Billions)7MMOS-CODE-7B(0-shot)
Math Word Problem SolvingASDiv-AExecution Accuracy87.6MMOS-DeepSeekMath-7B(0-shot)
Math Word Problem SolvingASDiv-AExecution Accuracy85.1MMOS-CODE-34B(0-shot)
Math Word Problem SolvingASDiv-AExecution Accuracy78.6MMOS-CODE-7B(0-shot)
Math Word Problem SolvingSVAMPExecution Accuracy80.6MMOS-CODE-34B(0-shot)
Math Word Problem SolvingSVAMPExecution Accuracy79.3MMOS-DeepSeekMath-7B(0-shot)
Math Word Problem SolvingSVAMPExecution Accuracy76.4MMOS-CODE-7B(0-shot)
Mathematical Question AnsweringMATHAccuracy63.7MMOS-DeepSeekMath-7B(0-shot,k=50)
Mathematical Question AnsweringMATHParameters (Billions)7MMOS-DeepSeekMath-7B(0-shot,k=50)
Mathematical Question AnsweringMATHAccuracy55MMOS-DeepSeekMath-7B(0-shot)
Mathematical Question AnsweringMATHParameters (Billions)7MMOS-DeepSeekMath-7B(0-shot)
Mathematical Question AnsweringMATHAccuracy49.5MMOS-CODE-34B(0-shot)
Mathematical Question AnsweringMATHParameters (Billions)34MMOS-CODE-34B(0-shot)
Mathematical Question AnsweringMATHAccuracy44.3MMOS-CODE-7B(0-shot)
Mathematical Question AnsweringMATHParameters (Billions)7MMOS-CODE-7B(0-shot)
Mathematical Question AnsweringASDiv-AExecution Accuracy87.6MMOS-DeepSeekMath-7B(0-shot)
Mathematical Question AnsweringASDiv-AExecution Accuracy85.1MMOS-CODE-34B(0-shot)
Mathematical Question AnsweringASDiv-AExecution Accuracy78.6MMOS-CODE-7B(0-shot)
Mathematical Question AnsweringSVAMPExecution Accuracy80.6MMOS-CODE-34B(0-shot)
Mathematical Question AnsweringSVAMPExecution Accuracy79.3MMOS-DeepSeekMath-7B(0-shot)
Mathematical Question AnsweringSVAMPExecution Accuracy76.4MMOS-CODE-7B(0-shot)
Mathematical ReasoningMATHAccuracy63.7MMOS-DeepSeekMath-7B(0-shot,k=50)
Mathematical ReasoningMATHParameters (Billions)7MMOS-DeepSeekMath-7B(0-shot,k=50)
Mathematical ReasoningMATHAccuracy55MMOS-DeepSeekMath-7B(0-shot)
Mathematical ReasoningMATHParameters (Billions)7MMOS-DeepSeekMath-7B(0-shot)
Mathematical ReasoningMATHAccuracy49.5MMOS-CODE-34B(0-shot)
Mathematical ReasoningMATHParameters (Billions)34MMOS-CODE-34B(0-shot)
Mathematical ReasoningMATHAccuracy44.3MMOS-CODE-7B(0-shot)
Mathematical ReasoningMATHParameters (Billions)7MMOS-CODE-7B(0-shot)
Mathematical ReasoningASDiv-AExecution Accuracy87.6MMOS-DeepSeekMath-7B(0-shot)
Mathematical ReasoningASDiv-AExecution Accuracy85.1MMOS-CODE-34B(0-shot)
Mathematical ReasoningASDiv-AExecution Accuracy78.6MMOS-CODE-7B(0-shot)
Mathematical ReasoningSVAMPExecution Accuracy80.6MMOS-CODE-34B(0-shot)
Mathematical ReasoningSVAMPExecution Accuracy79.3MMOS-DeepSeekMath-7B(0-shot)
Mathematical ReasoningSVAMPExecution Accuracy76.4MMOS-CODE-7B(0-shot)
Arithmetic ReasoningGSM8KAccuracy87.2MMOS-DeepSeekMath-7B(0-shot,k=50)
Arithmetic ReasoningGSM8KParameters (Billion)7MMOS-DeepSeekMath-7B(0-shot,k=50)
Arithmetic ReasoningGSM8KAccuracy80.5MMOS-DeepSeekMath-7B(0-shot)
Arithmetic ReasoningGSM8KParameters (Billion)7MMOS-DeepSeekMath-7B(0-shot)
Arithmetic ReasoningGSM8KAccuracy80.4MMOS-CODE-34B(0-shot)
Arithmetic ReasoningGSM8KParameters (Billion)34MMOS-CODE-34B(0-shot)
Arithmetic ReasoningGSM8KAccuracy73.9MMOS-CODE-7B(0-shot)
Arithmetic ReasoningGSM8KParameters (Billion)7MMOS-CODE-7B(0-shot)

Related Papers

DCR: Quantifying Data Contamination in LLMs Evaluation2025-07-15CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization2025-07-08DS@GT at CheckThat! 2025: Evaluating Context and Tokenization Strategies for Numerical Fact Verification2025-07-08Prover Agent: An Agent-based Framework for Formal Mathematical Proofs2025-06-24Towards Advanced Mathematical Reasoning for LLMs via First-Order Logic Theorem Proving2025-06-20FinLMM-R1: Enhancing Financial Reasoning in LMM through Scalable Data and Reward Design2025-06-16MATP-BENCH: Can MLLM Be a Good Automated Theorem Prover for Multimodal Problems?2025-06-06Safe: Enhancing Mathematical Reasoning in Large Language Models via Retrospective Step-aware Formal Verification2025-06-05