DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y. K. Li, Y. Wu, Daya Guo

2024-02-05Mathematical Reasoning Math Math Word Problem Solving Arithmetic Reasoning

Paper PDF Code Code(official)Code Code Code

Abstract

Mathematical reasoning poses a significant challenge for language models due to its complex and structured nature. In this paper, we introduce DeepSeekMath 7B, which continues pre-training DeepSeek-Coder-Base-v1.5 7B with 120B math-related tokens sourced from Common Crawl, together with natural language and code data. DeepSeekMath 7B has achieved an impressive score of 51.7% on the competition-level MATH benchmark without relying on external toolkits and voting techniques, approaching the performance level of Gemini-Ultra and GPT-4. Self-consistency over 64 samples from DeepSeekMath 7B achieves 60.9% on MATH. The mathematical reasoning capability of DeepSeekMath is attributed to two key factors: First, we harness the significant potential of publicly available web data through a meticulously engineered data selection pipeline. Second, we introduce Group Relative Policy Optimization (GRPO), a variant of Proximal Policy Optimization (PPO), that enhances mathematical reasoning abilities while concurrently optimizing the memory usage of PPO.

Results

Task	Dataset	Metric	Value	Model
Question Answering	MATH	Accuracy	58.8	DeepSeekMATH-RL-7B (w/ code, greedy decoding)
Question Answering	MATH	Parameters (Billions)	7	DeepSeekMATH-RL-7B (w/ code, greedy decoding)
Question Answering	MATH	Accuracy	51.7	DeepSeekMATH-RL-7B (greedy decoding)
Question Answering	MATH	Parameters (Billions)	7	DeepSeekMATH-RL-7B (greedy decoding)
Math Word Problem Solving	MATH	Accuracy	58.8	DeepSeekMATH-RL-7B (w/ code, greedy decoding)
Math Word Problem Solving	MATH	Parameters (Billions)	7	DeepSeekMATH-RL-7B (w/ code, greedy decoding)
Math Word Problem Solving	MATH	Accuracy	51.7	DeepSeekMATH-RL-7B (greedy decoding)
Math Word Problem Solving	MATH	Parameters (Billions)	7	DeepSeekMATH-RL-7B (greedy decoding)
Mathematical Question Answering	MATH	Accuracy	58.8	DeepSeekMATH-RL-7B (w/ code, greedy decoding)
Mathematical Question Answering	MATH	Parameters (Billions)	7	DeepSeekMATH-RL-7B (w/ code, greedy decoding)
Mathematical Question Answering	MATH	Accuracy	51.7	DeepSeekMATH-RL-7B (greedy decoding)
Mathematical Question Answering	MATH	Parameters (Billions)	7	DeepSeekMATH-RL-7B (greedy decoding)
Mathematical Reasoning	MATH	Accuracy	58.8	DeepSeekMATH-RL-7B (w/ code, greedy decoding)
Mathematical Reasoning	MATH	Parameters (Billions)	7	DeepSeekMATH-RL-7B (w/ code, greedy decoding)
Mathematical Reasoning	MATH	Accuracy	51.7	DeepSeekMATH-RL-7B (greedy decoding)
Mathematical Reasoning	MATH	Parameters (Billions)	7	DeepSeekMATH-RL-7B (greedy decoding)
Arithmetic Reasoning	GSM8K	Accuracy	88.2	DeepSeekMATH-RL-7B
Arithmetic Reasoning	GSM8K	Parameters (Billion)	7	DeepSeekMATH-RL-7B

Abstract

Results

Task	Dataset	Metric	Value	Model
Question Answering	MATH	Accuracy	58.8	DeepSeekMATH-RL-7B (w/ code, greedy decoding)
Question Answering	MATH	Parameters (Billions)	7	DeepSeekMATH-RL-7B (w/ code, greedy decoding)
Question Answering	MATH	Accuracy	51.7	DeepSeekMATH-RL-7B (greedy decoding)
Question Answering	MATH	Parameters (Billions)	7	DeepSeekMATH-RL-7B (greedy decoding)
Math Word Problem Solving	MATH	Accuracy	58.8	DeepSeekMATH-RL-7B (w/ code, greedy decoding)
Math Word Problem Solving	MATH	Parameters (Billions)	7	DeepSeekMATH-RL-7B (w/ code, greedy decoding)
Math Word Problem Solving	MATH	Accuracy	51.7	DeepSeekMATH-RL-7B (greedy decoding)
Math Word Problem Solving	MATH	Parameters (Billions)	7	DeepSeekMATH-RL-7B (greedy decoding)
Mathematical Question Answering	MATH	Accuracy	58.8	DeepSeekMATH-RL-7B (w/ code, greedy decoding)
Mathematical Question Answering	MATH	Parameters (Billions)	7	DeepSeekMATH-RL-7B (w/ code, greedy decoding)
Mathematical Question Answering	MATH	Accuracy	51.7	DeepSeekMATH-RL-7B (greedy decoding)
Mathematical Question Answering	MATH	Parameters (Billions)	7	DeepSeekMATH-RL-7B (greedy decoding)
Mathematical Reasoning	MATH	Accuracy	58.8	DeepSeekMATH-RL-7B (w/ code, greedy decoding)
Mathematical Reasoning	MATH	Parameters (Billions)	7	DeepSeekMATH-RL-7B (w/ code, greedy decoding)
Mathematical Reasoning	MATH	Accuracy	51.7	DeepSeekMATH-RL-7B (greedy decoding)
Mathematical Reasoning	MATH	Parameters (Billions)	7	DeepSeekMATH-RL-7B (greedy decoding)
Arithmetic Reasoning	GSM8K	Accuracy	88.2	DeepSeekMATH-RL-7B
Arithmetic Reasoning	GSM8K	Parameters (Billion)	7	DeepSeekMATH-RL-7B

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Abstract

Results

Related Papers

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Abstract

Results

Related Papers