TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/DeepSeekMath: Pushing the Limits of Mathematical Reasoning...

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y. K. Li, Y. Wu, Daya Guo

2024-02-05Mathematical ReasoningMathMath Word Problem SolvingArithmetic Reasoning
PaperPDFCodeCode(official)CodeCodeCode

Abstract

Mathematical reasoning poses a significant challenge for language models due to its complex and structured nature. In this paper, we introduce DeepSeekMath 7B, which continues pre-training DeepSeek-Coder-Base-v1.5 7B with 120B math-related tokens sourced from Common Crawl, together with natural language and code data. DeepSeekMath 7B has achieved an impressive score of 51.7% on the competition-level MATH benchmark without relying on external toolkits and voting techniques, approaching the performance level of Gemini-Ultra and GPT-4. Self-consistency over 64 samples from DeepSeekMath 7B achieves 60.9% on MATH. The mathematical reasoning capability of DeepSeekMath is attributed to two key factors: First, we harness the significant potential of publicly available web data through a meticulously engineered data selection pipeline. Second, we introduce Group Relative Policy Optimization (GRPO), a variant of Proximal Policy Optimization (PPO), that enhances mathematical reasoning abilities while concurrently optimizing the memory usage of PPO.

Results

TaskDatasetMetricValueModel
Question AnsweringMATHAccuracy58.8DeepSeekMATH-RL-7B (w/ code, greedy decoding)
Question AnsweringMATHParameters (Billions)7DeepSeekMATH-RL-7B (w/ code, greedy decoding)
Question AnsweringMATHAccuracy51.7DeepSeekMATH-RL-7B (greedy decoding)
Question AnsweringMATHParameters (Billions)7DeepSeekMATH-RL-7B (greedy decoding)
Math Word Problem SolvingMATHAccuracy58.8DeepSeekMATH-RL-7B (w/ code, greedy decoding)
Math Word Problem SolvingMATHParameters (Billions)7DeepSeekMATH-RL-7B (w/ code, greedy decoding)
Math Word Problem SolvingMATHAccuracy51.7DeepSeekMATH-RL-7B (greedy decoding)
Math Word Problem SolvingMATHParameters (Billions)7DeepSeekMATH-RL-7B (greedy decoding)
Mathematical Question AnsweringMATHAccuracy58.8DeepSeekMATH-RL-7B (w/ code, greedy decoding)
Mathematical Question AnsweringMATHParameters (Billions)7DeepSeekMATH-RL-7B (w/ code, greedy decoding)
Mathematical Question AnsweringMATHAccuracy51.7DeepSeekMATH-RL-7B (greedy decoding)
Mathematical Question AnsweringMATHParameters (Billions)7DeepSeekMATH-RL-7B (greedy decoding)
Mathematical ReasoningMATHAccuracy58.8DeepSeekMATH-RL-7B (w/ code, greedy decoding)
Mathematical ReasoningMATHParameters (Billions)7DeepSeekMATH-RL-7B (w/ code, greedy decoding)
Mathematical ReasoningMATHAccuracy51.7DeepSeekMATH-RL-7B (greedy decoding)
Mathematical ReasoningMATHParameters (Billions)7DeepSeekMATH-RL-7B (greedy decoding)
Arithmetic ReasoningGSM8KAccuracy88.2DeepSeekMATH-RL-7B
Arithmetic ReasoningGSM8KParameters (Billion)7DeepSeekMATH-RL-7B

Related Papers

VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks2025-07-17QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation2025-07-17A Survey of Deep Learning for Geometry Problem Solving2025-07-16Scaling Up RL: Unlocking Diverse Reasoning in LLMs via Prolonged Training2025-07-16KisMATH: Do LLMs Have Knowledge of Implicit Structures in Mathematical Reasoning?2025-07-15Temperature and Persona Shape LLM Agent Consensus With Minimal Accuracy Gains in Qualitative Coding2025-07-15Personalized Exercise Recommendation with Semantically-Grounded Knowledge Tracing2025-07-15DCR: Quantifying Data Contamination in LLMs Evaluation2025-07-15