AlphaMath Almost Zero: Process Supervision without Process

Guoxin Chen, Minpeng Liao, Chengxi Li, Kai Fan

2024-05-06Mathematical Reasoning Math Word Problem Solving

Abstract

Although recent advancements in large language models (LLMs) have significantly improved their performance on various tasks, they still face challenges with complex and symbolic multi-step reasoning, particularly in mathematical reasoning. To bolster the mathematical reasoning capabilities of LLMs, most existing efforts concentrate on seeking assistance from either domain experts or GPT-4 for high-quality process-supervised data, which is not only expensive but also labor-intensive. In our study, we propose an innovative framework, AlphaMath, that bypasses the need for process annotations (from humans or GPTs) by leveraging Monte Carlo Tree Search (MCTS). This framework focuses on unleashing the potential of a well-pretrained LLM to autonomously enhance its mathematical reasoning. Specifically, we integrate a value model with the LLM, automatically generating both process supervision and step-level evaluation signals in MCTS. Furthermore, we propose an efficient inference strategy, step-level beam search, where the value model is crafted to assist the policy model (i.e., LLM) in navigating more effective reasoning paths, rather than solely relying on prior probabilities. The experimental results on both in-domain and out-of-domain datasets demonstrate that even without GPT-4 or human-annotated process supervision, our AlphaMath framework achieves comparable or superior results to previous state-of-the-art methods.

Results

Task	Dataset	Metric	Value	Model
Question Answering	MATH	Accuracy	66.3	AlphaMath-7B-SBS@3
Math Word Problem Solving	MATH	Accuracy	66.3	AlphaMath-7B-SBS@3
Mathematical Question Answering	MATH	Accuracy	66.3	AlphaMath-7B-SBS@3
Mathematical Reasoning	MATH	Accuracy	66.3	AlphaMath-7B-SBS@3

AlphaMath Almost Zero: Process Supervision without Process

Abstract

Results

Related Papers

AlphaMath Almost Zero: Process Supervision without Process

Abstract

Results

Related Papers