TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/AlphaMath Almost Zero: Process Supervision without Process

AlphaMath Almost Zero: Process Supervision without Process

Guoxin Chen, Minpeng Liao, Chengxi Li, Kai Fan

2024-05-06Mathematical ReasoningMath Word Problem Solving
PaperPDFCode(official)

Abstract

Although recent advancements in large language models (LLMs) have significantly improved their performance on various tasks, they still face challenges with complex and symbolic multi-step reasoning, particularly in mathematical reasoning. To bolster the mathematical reasoning capabilities of LLMs, most existing efforts concentrate on seeking assistance from either domain experts or GPT-4 for high-quality process-supervised data, which is not only expensive but also labor-intensive. In our study, we propose an innovative framework, AlphaMath, that bypasses the need for process annotations (from humans or GPTs) by leveraging Monte Carlo Tree Search (MCTS). This framework focuses on unleashing the potential of a well-pretrained LLM to autonomously enhance its mathematical reasoning. Specifically, we integrate a value model with the LLM, automatically generating both process supervision and step-level evaluation signals in MCTS. Furthermore, we propose an efficient inference strategy, step-level beam search, where the value model is crafted to assist the policy model (i.e., LLM) in navigating more effective reasoning paths, rather than solely relying on prior probabilities. The experimental results on both in-domain and out-of-domain datasets demonstrate that even without GPT-4 or human-annotated process supervision, our AlphaMath framework achieves comparable or superior results to previous state-of-the-art methods.

Results

TaskDatasetMetricValueModel
Question AnsweringMATHAccuracy66.3AlphaMath-7B-SBS@3
Math Word Problem SolvingMATHAccuracy66.3AlphaMath-7B-SBS@3
Mathematical Question AnsweringMATHAccuracy66.3AlphaMath-7B-SBS@3
Mathematical ReasoningMATHAccuracy66.3AlphaMath-7B-SBS@3

Related Papers

VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks2025-07-17A Survey of Deep Learning for Geometry Problem Solving2025-07-16KisMATH: Do LLMs Have Knowledge of Implicit Structures in Mathematical Reasoning?2025-07-15Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination2025-07-14A Practical Two-Stage Recipe for Mathematical LLMs: Maximizing Accuracy with SFT and Efficiency with Reinforcement Learning2025-07-11Integrating External Tools with Large Language Models to Improve Accuracy2025-07-09Skywork-R1V3 Technical Report2025-07-08CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization2025-07-08