TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/WizardMath: Empowering Mathematical Reasoning for Large La...

WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct

Haipeng Luo, Qingfeng Sun, Can Xu, Pu Zhao, JianGuang Lou, Chongyang Tao, Xiubo Geng, QIngwei Lin, Shifeng Chen, Yansong Tang, Dongmei Zhang

2023-08-18Mathematical ReasoningMathMath Word Problem SolvingGSM8KArithmetic Reasoning
PaperPDFCode

Abstract

Large language models (LLMs), such as GPT-4, have shown remarkable performance in natural language processing (NLP) tasks, including challenging mathematical reasoning. However, most existing open-source models are only pre-trained on large-scale internet data and without math-related optimization. In this paper, we present WizardMath, which enhances the mathematical CoT reasoning abilities of LLMs without using external python tools, by applying our proposed Reinforcement Learning from Evol-Instruct Feedback (RLEIF) method to the domain of math. Through extensive experiments on two mathematical reasoning benchmarks, namely GSM8k and MATH, we reveal the extraordinary capabilities of our model. Remarkably, WizardMath-Mistral 7B surpasses top-tier open-source LLMs by a substantial margin with higher data efficiency. Furthermore, WizardMath 70B even outperforms GPT-3.5-Turbo, Claude 2, Gemini Pro and GPT-4-early-version. Additionally, our preliminary exploration highlights the pivotal role of instruction evolution and process supervision in achieving exceptional math performance. For more details refer to https://github.com/nlpxucan/WizardLM

Results

TaskDatasetMetricValueModel
Question AnsweringMATHAccuracy33WizardMath-7B-V1.1
Question AnsweringMATHParameters (Billions)7WizardMath-7B-V1.1
Question AnsweringMATHAccuracy22.7WizardMath-70B-V1.0
Question AnsweringMATHParameters (Billions)70WizardMath-70B-V1.0
Question AnsweringMATHAccuracy14WizardMath-13B-V1.0
Question AnsweringMATHParameters (Billions)13WizardMath-13B-V1.0
Question AnsweringMATHAccuracy10.7WizardMath-7B-V1.0
Question AnsweringMATHParameters (Billions)7WizardMath-7B-V1.0
Math Word Problem SolvingMATHAccuracy33WizardMath-7B-V1.1
Math Word Problem SolvingMATHParameters (Billions)7WizardMath-7B-V1.1
Math Word Problem SolvingMATHAccuracy22.7WizardMath-70B-V1.0
Math Word Problem SolvingMATHParameters (Billions)70WizardMath-70B-V1.0
Math Word Problem SolvingMATHAccuracy14WizardMath-13B-V1.0
Math Word Problem SolvingMATHParameters (Billions)13WizardMath-13B-V1.0
Math Word Problem SolvingMATHAccuracy10.7WizardMath-7B-V1.0
Math Word Problem SolvingMATHParameters (Billions)7WizardMath-7B-V1.0
Mathematical Question AnsweringMATHAccuracy33WizardMath-7B-V1.1
Mathematical Question AnsweringMATHParameters (Billions)7WizardMath-7B-V1.1
Mathematical Question AnsweringMATHAccuracy22.7WizardMath-70B-V1.0
Mathematical Question AnsweringMATHParameters (Billions)70WizardMath-70B-V1.0
Mathematical Question AnsweringMATHAccuracy14WizardMath-13B-V1.0
Mathematical Question AnsweringMATHParameters (Billions)13WizardMath-13B-V1.0
Mathematical Question AnsweringMATHAccuracy10.7WizardMath-7B-V1.0
Mathematical Question AnsweringMATHParameters (Billions)7WizardMath-7B-V1.0
Mathematical ReasoningMATHAccuracy33WizardMath-7B-V1.1
Mathematical ReasoningMATHParameters (Billions)7WizardMath-7B-V1.1
Mathematical ReasoningMATHAccuracy22.7WizardMath-70B-V1.0
Mathematical ReasoningMATHParameters (Billions)70WizardMath-70B-V1.0
Mathematical ReasoningMATHAccuracy14WizardMath-13B-V1.0
Mathematical ReasoningMATHParameters (Billions)13WizardMath-13B-V1.0
Mathematical ReasoningMATHAccuracy10.7WizardMath-7B-V1.0
Mathematical ReasoningMATHParameters (Billions)7WizardMath-7B-V1.0
Arithmetic ReasoningGSM8KAccuracy83.2WizardMath-7B-V1.1
Arithmetic ReasoningGSM8KParameters (Billion)7WizardMath-7B-V1.1
Arithmetic ReasoningGSM8KAccuracy81.6WizardMath-70B-V1.0
Arithmetic ReasoningGSM8KParameters (Billion)70WizardMath-70B-V1.0
Arithmetic ReasoningGSM8KAccuracy63.9WizardMath-13B-V1.0
Arithmetic ReasoningGSM8KParameters (Billion)13WizardMath-13B-V1.0
Arithmetic ReasoningGSM8KAccuracy54.9WizardMath-7B-V1.0
Arithmetic ReasoningGSM8KParameters (Billion)7WizardMath-7B-V1.0

Related Papers

VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks2025-07-17QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation2025-07-17GEMMAS: Graph-based Evaluation Metrics for Multi Agent Systems2025-07-17A Survey of Deep Learning for Geometry Problem Solving2025-07-16Scaling Up RL: Unlocking Diverse Reasoning in LLMs via Prolonged Training2025-07-16DAC: A Dynamic Attention-aware Approach for Task-Agnostic Prompt Compression2025-07-16KisMATH: Do LLMs Have Knowledge of Implicit Structures in Mathematical Reasoning?2025-07-15Temperature and Persona Shape LLM Agent Consensus With Minimal Accuracy Gains in Qualitative Coding2025-07-15