TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Augmenting Math Word Problems via Iterative Question Compo...

Augmenting Math Word Problems via Iterative Question Composing

Haoxiong Liu, Yifan Zhang, Yifan Luo, Andrew Chi-Chih Yao

2024-01-17Mathematical ReasoningMathMath Word Problem Solving
PaperPDFCode(official)

Abstract

Despite the advancements in large language models (LLMs) for mathematical reasoning, solving competition-level math problems remains a significant challenge, especially for open-source LLMs without external tools. We introduce the MMIQC dataset, comprising a mixture of processed web data and synthetic question-response pairs, aimed at enhancing the mathematical reasoning capabilities of base language models. Models fine-tuned on MMIQC consistently surpass their counterparts in performance on the MATH benchmark across various model sizes. Notably, Qwen-72B-MMIQC achieves a 45.0% accuracy, exceeding the previous open-source state-of-the-art by 8.2% and outperforming the initial version GPT-4 released in 2023. Extensive evaluation results on Hungarian high school finals suggest that such improvement can generalize to unseen data. Our ablation study on MMIQC reveals that a large part of the improvement can be attributed to our novel augmentation method, Iterative Question Composing (IQC), which involves iteratively composing new questions from seed problems using an LLM and applying rejection sampling through another LLM.

Results

TaskDatasetMetricValueModel
Question AnsweringMATHAccuracy45MMIQC-72B
Question AnsweringMATHParameters (Billions)72MMIQC-72B
Math Word Problem SolvingMATHAccuracy45MMIQC-72B
Math Word Problem SolvingMATHParameters (Billions)72MMIQC-72B
Mathematical Question AnsweringMATHAccuracy45MMIQC-72B
Mathematical Question AnsweringMATHParameters (Billions)72MMIQC-72B
Mathematical ReasoningMATHAccuracy45MMIQC-72B
Mathematical ReasoningMATHParameters (Billions)72MMIQC-72B

Related Papers

VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks2025-07-17QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation2025-07-17A Survey of Deep Learning for Geometry Problem Solving2025-07-16Scaling Up RL: Unlocking Diverse Reasoning in LLMs via Prolonged Training2025-07-16KisMATH: Do LLMs Have Knowledge of Implicit Structures in Mathematical Reasoning?2025-07-15Temperature and Persona Shape LLM Agent Consensus With Minimal Accuracy Gains in Qualitative Coding2025-07-15Personalized Exercise Recommendation with Semantically-Grounded Knowledge Tracing2025-07-15Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination2025-07-14