TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Learning Multi-Step Reasoning by Solving Arithmetic Tasks

Learning Multi-Step Reasoning by Solving Arithmetic Tasks

Tianduo Wang, Wei Lu

2023-06-02Mathematical ReasoningMathMath Word Problem Solving
PaperPDFCode(official)

Abstract

Mathematical reasoning is regarded as a necessary ability for Language Models (LMs). Recent works demonstrate large LMs' impressive performance in solving math problems. The success is attributed to their Chain-of-Thought (CoT) reasoning abilities, i.e., the ability to decompose complex questions into step-by-step reasoning chains, but such ability seems only to emerge from models with abundant parameters. This work investigates how to incorporate relatively small LMs with the capabilities of multi-step reasoning. We propose to inject such abilities by continually pre-training LMs on a synthetic dataset MsAT which is composed of Multi-step Arithmetic Tasks. Our experiments on four math word problem datasets show the effectiveness of the proposed method in enhancing LMs' math reasoning abilities.

Results

TaskDatasetMetricValueModel
Question AnsweringMAWPSAccuracy (%)94.3MsAT-DeductReasoner
Question AnsweringSVAMPExecution Accuracy48.9MsAT-DeductReasoner
Math Word Problem SolvingMAWPSAccuracy (%)94.3MsAT-DeductReasoner
Math Word Problem SolvingSVAMPExecution Accuracy48.9MsAT-DeductReasoner
Mathematical Question AnsweringMAWPSAccuracy (%)94.3MsAT-DeductReasoner
Mathematical Question AnsweringSVAMPExecution Accuracy48.9MsAT-DeductReasoner
Mathematical ReasoningMAWPSAccuracy (%)94.3MsAT-DeductReasoner
Mathematical ReasoningSVAMPExecution Accuracy48.9MsAT-DeductReasoner

Related Papers

VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks2025-07-17QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation2025-07-17A Survey of Deep Learning for Geometry Problem Solving2025-07-16Scaling Up RL: Unlocking Diverse Reasoning in LLMs via Prolonged Training2025-07-16KisMATH: Do LLMs Have Knowledge of Implicit Structures in Mathematical Reasoning?2025-07-15Temperature and Persona Shape LLM Agent Consensus With Minimal Accuracy Gains in Qualitative Coding2025-07-15Personalized Exercise Recommendation with Semantically-Grounded Knowledge Tracing2025-07-15Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination2025-07-14