Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement

An Yang, Beichen Zhang, Binyuan Hui, Bofei Gao, Bowen Yu, Chengpeng Li, Dayiheng Liu, Jianhong Tu, Jingren Zhou, Junyang Lin, Keming Lu, Mingfeng Xue, Runji Lin, Tianyu Liu, Xingzhang Ren, Zhenru Zhang

2024-09-18Mathematical Reasoning Math Math Word Problem Solving Philosophy GSM8K

Paper PDF

Abstract

In this report, we present a series of math-specific large language models: Qwen2.5-Math and Qwen2.5-Math-Instruct-1.5B/7B/72B. The core innovation of the Qwen2.5 series lies in integrating the philosophy of self-improvement throughout the entire pipeline, from pre-training and post-training to inference: (1) During the pre-training phase, Qwen2-Math-Instruct is utilized to generate large-scale, high-quality mathematical data. (2) In the post-training phase, we develop a reward model (RM) by conducting massive sampling from Qwen2-Math-Instruct. This RM is then applied to the iterative evolution of data in supervised fine-tuning (SFT). With a stronger SFT model, it's possible to iteratively train and update the RM, which in turn guides the next round of SFT data iteration. On the final SFT model, we employ the ultimate RM for reinforcement learning, resulting in the Qwen2.5-Math-Instruct. (3) Furthermore, during the inference stage, the RM is used to guide sampling, optimizing the model's performance. Qwen2.5-Math-Instruct supports both Chinese and English, and possess advanced mathematical reasoning capabilities, including Chain-of-Thought (CoT) and Tool-Integrated Reasoning (TIR). We evaluate our models on 10 mathematics datasets in both English and Chinese, such as GSM8K, MATH, GaoKao, AMC23, and AIME24, covering a range of difficulties from grade school level to math competition problems.

Results

Task	Dataset	Metric	Value	Model
Question Answering	MATH	Accuracy	88.1	Qwen2.5-Math-72B-Instruct(TIR,Greedy)
Question Answering	MATH	Parameters (Billions)	72	Qwen2.5-Math-72B-Instruct(TIR,Greedy)
Question Answering	MATH	Accuracy	85.9	Qwen2.5-Math-72B-Instruct(COT,Greedy)
Question Answering	MATH	Parameters (Billions)	72	Qwen2.5-Math-72B-Instruct(COT,Greedy)
Question Answering	MATH	Accuracy	85.2	Qwen2.5-Math-7B-Instruct(TIR,Greedy)
Question Answering	MATH	Parameters (Billions)	7	Qwen2.5-Math-7B-Instruct(TIR,Greedy)
Question Answering	MATH	Accuracy	83.6	Qwen2.5-Math-7B-Instruct(COT,Greedy)
Question Answering	MATH	Parameters (Billions)	7	Qwen2.5-Math-7B-Instruct(COT,Greedy)
Question Answering	MATH	Accuracy	79.9	Qwen2.5-Math-1.5B-Instruct(TIR,Greedy)
Question Answering	MATH	Parameters (Billions)	1.5	Qwen2.5-Math-1.5B-Instruct(TIR,Greedy)
Question Answering	MATH	Accuracy	75.8	Qwen2.5-Math-1.5B-Instruct(COT,Greedy)
Question Answering	MATH	Parameters (Billions)	1.5	Qwen2.5-Math-1.5B-Instruct(COT,Greedy)
Math Word Problem Solving	MATH	Accuracy	88.1	Qwen2.5-Math-72B-Instruct(TIR,Greedy)
Math Word Problem Solving	MATH	Parameters (Billions)	72	Qwen2.5-Math-72B-Instruct(TIR,Greedy)
Math Word Problem Solving	MATH	Accuracy	85.9	Qwen2.5-Math-72B-Instruct(COT,Greedy)
Math Word Problem Solving	MATH	Parameters (Billions)	72	Qwen2.5-Math-72B-Instruct(COT,Greedy)
Math Word Problem Solving	MATH	Accuracy	85.2	Qwen2.5-Math-7B-Instruct(TIR,Greedy)
Math Word Problem Solving	MATH	Parameters (Billions)	7	Qwen2.5-Math-7B-Instruct(TIR,Greedy)
Math Word Problem Solving	MATH	Accuracy	83.6	Qwen2.5-Math-7B-Instruct(COT,Greedy)
Math Word Problem Solving	MATH	Parameters (Billions)	7	Qwen2.5-Math-7B-Instruct(COT,Greedy)
Math Word Problem Solving	MATH	Accuracy	79.9	Qwen2.5-Math-1.5B-Instruct(TIR,Greedy)
Math Word Problem Solving	MATH	Parameters (Billions)	1.5	Qwen2.5-Math-1.5B-Instruct(TIR,Greedy)
Math Word Problem Solving	MATH	Accuracy	75.8	Qwen2.5-Math-1.5B-Instruct(COT,Greedy)
Math Word Problem Solving	MATH	Parameters (Billions)	1.5	Qwen2.5-Math-1.5B-Instruct(COT,Greedy)
Mathematical Question Answering	MATH	Accuracy	88.1	Qwen2.5-Math-72B-Instruct(TIR,Greedy)
Mathematical Question Answering	MATH	Parameters (Billions)	72	Qwen2.5-Math-72B-Instruct(TIR,Greedy)
Mathematical Question Answering	MATH	Accuracy	85.9	Qwen2.5-Math-72B-Instruct(COT,Greedy)
Mathematical Question Answering	MATH	Parameters (Billions)	72	Qwen2.5-Math-72B-Instruct(COT,Greedy)
Mathematical Question Answering	MATH	Accuracy	85.2	Qwen2.5-Math-7B-Instruct(TIR,Greedy)
Mathematical Question Answering	MATH	Parameters (Billions)	7	Qwen2.5-Math-7B-Instruct(TIR,Greedy)
Mathematical Question Answering	MATH	Accuracy	83.6	Qwen2.5-Math-7B-Instruct(COT,Greedy)
Mathematical Question Answering	MATH	Parameters (Billions)	7	Qwen2.5-Math-7B-Instruct(COT,Greedy)
Mathematical Question Answering	MATH	Accuracy	79.9	Qwen2.5-Math-1.5B-Instruct(TIR,Greedy)
Mathematical Question Answering	MATH	Parameters (Billions)	1.5	Qwen2.5-Math-1.5B-Instruct(TIR,Greedy)
Mathematical Question Answering	MATH	Accuracy	75.8	Qwen2.5-Math-1.5B-Instruct(COT,Greedy)
Mathematical Question Answering	MATH	Parameters (Billions)	1.5	Qwen2.5-Math-1.5B-Instruct(COT,Greedy)
Mathematical Reasoning	AMC23	Acc	62.5	Qwen2.5-Math-7B-instruct
Mathematical Reasoning	MATH	Accuracy	88.1	Qwen2.5-Math-72B-Instruct(TIR,Greedy)
Mathematical Reasoning	MATH	Parameters (Billions)	72	Qwen2.5-Math-72B-Instruct(TIR,Greedy)
Mathematical Reasoning	MATH	Accuracy	85.9	Qwen2.5-Math-72B-Instruct(COT,Greedy)
Mathematical Reasoning	MATH	Parameters (Billions)	72	Qwen2.5-Math-72B-Instruct(COT,Greedy)
Mathematical Reasoning	MATH	Accuracy	85.2	Qwen2.5-Math-7B-Instruct(TIR,Greedy)
Mathematical Reasoning	MATH	Parameters (Billions)	7	Qwen2.5-Math-7B-Instruct(TIR,Greedy)
Mathematical Reasoning	MATH	Accuracy	83.6	Qwen2.5-Math-7B-Instruct(COT,Greedy)
Mathematical Reasoning	MATH	Parameters (Billions)	7	Qwen2.5-Math-7B-Instruct(COT,Greedy)
Mathematical Reasoning	MATH	Accuracy	79.9	Qwen2.5-Math-1.5B-Instruct(TIR,Greedy)
Mathematical Reasoning	MATH	Parameters (Billions)	1.5	Qwen2.5-Math-1.5B-Instruct(TIR,Greedy)
Mathematical Reasoning	MATH	Accuracy	75.8	Qwen2.5-Math-1.5B-Instruct(COT,Greedy)
Mathematical Reasoning	MATH	Parameters (Billions)	1.5	Qwen2.5-Math-1.5B-Instruct(COT,Greedy)

Abstract

Results

Task	Dataset	Metric	Value	Model
Question Answering	MATH	Accuracy	88.1	Qwen2.5-Math-72B-Instruct(TIR,Greedy)
Question Answering	MATH	Parameters (Billions)	72	Qwen2.5-Math-72B-Instruct(TIR,Greedy)
Question Answering	MATH	Accuracy	85.9	Qwen2.5-Math-72B-Instruct(COT,Greedy)
Question Answering	MATH	Parameters (Billions)	72	Qwen2.5-Math-72B-Instruct(COT,Greedy)
Question Answering	MATH	Accuracy	85.2	Qwen2.5-Math-7B-Instruct(TIR,Greedy)
Question Answering	MATH	Parameters (Billions)	7	Qwen2.5-Math-7B-Instruct(TIR,Greedy)
Question Answering	MATH	Accuracy	83.6	Qwen2.5-Math-7B-Instruct(COT,Greedy)
Question Answering	MATH	Parameters (Billions)	7	Qwen2.5-Math-7B-Instruct(COT,Greedy)
Question Answering	MATH	Accuracy	79.9	Qwen2.5-Math-1.5B-Instruct(TIR,Greedy)
Question Answering	MATH	Parameters (Billions)	1.5	Qwen2.5-Math-1.5B-Instruct(TIR,Greedy)
Question Answering	MATH	Accuracy	75.8	Qwen2.5-Math-1.5B-Instruct(COT,Greedy)
Question Answering	MATH	Parameters (Billions)	1.5	Qwen2.5-Math-1.5B-Instruct(COT,Greedy)
Math Word Problem Solving	MATH	Accuracy	88.1	Qwen2.5-Math-72B-Instruct(TIR,Greedy)
Math Word Problem Solving	MATH	Parameters (Billions)	72	Qwen2.5-Math-72B-Instruct(TIR,Greedy)
Math Word Problem Solving	MATH	Accuracy	85.9	Qwen2.5-Math-72B-Instruct(COT,Greedy)
Math Word Problem Solving	MATH	Parameters (Billions)	72	Qwen2.5-Math-72B-Instruct(COT,Greedy)
Math Word Problem Solving	MATH	Accuracy	85.2	Qwen2.5-Math-7B-Instruct(TIR,Greedy)
Math Word Problem Solving	MATH	Parameters (Billions)	7	Qwen2.5-Math-7B-Instruct(TIR,Greedy)
Math Word Problem Solving	MATH	Accuracy	83.6	Qwen2.5-Math-7B-Instruct(COT,Greedy)
Math Word Problem Solving	MATH	Parameters (Billions)	7	Qwen2.5-Math-7B-Instruct(COT,Greedy)
Math Word Problem Solving	MATH	Accuracy	79.9	Qwen2.5-Math-1.5B-Instruct(TIR,Greedy)
Math Word Problem Solving	MATH	Parameters (Billions)	1.5	Qwen2.5-Math-1.5B-Instruct(TIR,Greedy)
Math Word Problem Solving	MATH	Accuracy	75.8	Qwen2.5-Math-1.5B-Instruct(COT,Greedy)
Math Word Problem Solving	MATH	Parameters (Billions)	1.5	Qwen2.5-Math-1.5B-Instruct(COT,Greedy)
Mathematical Question Answering	MATH	Accuracy	88.1	Qwen2.5-Math-72B-Instruct(TIR,Greedy)
Mathematical Question Answering	MATH	Parameters (Billions)	72	Qwen2.5-Math-72B-Instruct(TIR,Greedy)
Mathematical Question Answering	MATH	Accuracy	85.9	Qwen2.5-Math-72B-Instruct(COT,Greedy)
Mathematical Question Answering	MATH	Parameters (Billions)	72	Qwen2.5-Math-72B-Instruct(COT,Greedy)
Mathematical Question Answering	MATH	Accuracy	85.2	Qwen2.5-Math-7B-Instruct(TIR,Greedy)
Mathematical Question Answering	MATH	Parameters (Billions)	7	Qwen2.5-Math-7B-Instruct(TIR,Greedy)
Mathematical Question Answering	MATH	Accuracy	83.6	Qwen2.5-Math-7B-Instruct(COT,Greedy)
Mathematical Question Answering	MATH	Parameters (Billions)	7	Qwen2.5-Math-7B-Instruct(COT,Greedy)
Mathematical Question Answering	MATH	Accuracy	79.9	Qwen2.5-Math-1.5B-Instruct(TIR,Greedy)
Mathematical Question Answering	MATH	Parameters (Billions)	1.5	Qwen2.5-Math-1.5B-Instruct(TIR,Greedy)
Mathematical Question Answering	MATH	Accuracy	75.8	Qwen2.5-Math-1.5B-Instruct(COT,Greedy)
Mathematical Question Answering	MATH	Parameters (Billions)	1.5	Qwen2.5-Math-1.5B-Instruct(COT,Greedy)
Mathematical Reasoning	AMC23	Acc	62.5	Qwen2.5-Math-7B-instruct
Mathematical Reasoning	MATH	Accuracy	88.1	Qwen2.5-Math-72B-Instruct(TIR,Greedy)
Mathematical Reasoning	MATH	Parameters (Billions)	72	Qwen2.5-Math-72B-Instruct(TIR,Greedy)
Mathematical Reasoning	MATH	Accuracy	85.9	Qwen2.5-Math-72B-Instruct(COT,Greedy)
Mathematical Reasoning	MATH	Parameters (Billions)	72	Qwen2.5-Math-72B-Instruct(COT,Greedy)
Mathematical Reasoning	MATH	Accuracy	85.2	Qwen2.5-Math-7B-Instruct(TIR,Greedy)
Mathematical Reasoning	MATH	Parameters (Billions)	7	Qwen2.5-Math-7B-Instruct(TIR,Greedy)
Mathematical Reasoning	MATH	Accuracy	83.6	Qwen2.5-Math-7B-Instruct(COT,Greedy)
Mathematical Reasoning	MATH	Parameters (Billions)	7	Qwen2.5-Math-7B-Instruct(COT,Greedy)
Mathematical Reasoning	MATH	Accuracy	79.9	Qwen2.5-Math-1.5B-Instruct(TIR,Greedy)
Mathematical Reasoning	MATH	Parameters (Billions)	1.5	Qwen2.5-Math-1.5B-Instruct(TIR,Greedy)
Mathematical Reasoning	MATH	Accuracy	75.8	Qwen2.5-Math-1.5B-Instruct(COT,Greedy)
Mathematical Reasoning	MATH	Parameters (Billions)	1.5	Qwen2.5-Math-1.5B-Instruct(COT,Greedy)

Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement

Abstract

Results

Related Papers

Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement

Abstract

Results

Related Papers