MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning

Ke Wang, Houxing Ren, Aojun Zhou, Zimu Lu, Sichun Luo, Weikang Shi, Renrui Zhang, Linqi Song, Mingjie Zhan, Hongsheng Li

2023-10-05Mathematical Reasoning Math Math Word Problem Solving GSM8K Arithmetic Reasoning

Paper PDF Code(official)

Abstract

The recently released GPT-4 Code Interpreter has demonstrated remarkable proficiency in solving challenging math problems, primarily attributed to its ability to seamlessly reason with natural language, generate code, execute code, and continue reasoning based on the execution output. In this paper, we present a method to fine-tune open-source language models, enabling them to use code for modeling and deriving math equations and, consequently, enhancing their mathematical reasoning abilities. We propose a method of generating novel and high-quality datasets with math problems and their code-based solutions, referred to as MathCodeInstruct. Each solution interleaves natural language, code, and execution results. We also introduce a customized supervised fine-tuning and inference approach. This approach yields the MathCoder models, a family of models capable of generating code-based solutions for solving challenging math problems. Impressively, the MathCoder models achieve state-of-the-art scores among open-source LLMs on the MATH (45.2%) and GSM8K (83.9%) datasets, substantially outperforming other open-source alternatives. Notably, the MathCoder model not only surpasses ChatGPT-3.5 and PaLM-2 on GSM8K and MATH but also outperforms GPT-4 on the competition-level MATH dataset. The dataset and models will be released at https://github.com/mathllm/MathCoder.

Results

Task	Dataset	Metric	Value	Model
Question Answering	MATH	Accuracy	45.2	MathCoder-CL-34B
Question Answering	MATH	Parameters (Billions)	34	MathCoder-CL-34B
Question Answering	MATH	Accuracy	45.1	MathCoder-L-34B
Question Answering	MATH	Parameters (Billions)	34	MathCoder-L-34B
Question Answering	MATH	Accuracy	35.9	MathCoder-CL-13B
Question Answering	MATH	Parameters (Billions)	13	MathCoder-CL-13B
Question Answering	MATH	Accuracy	30.2	MathCoder-CL-7B
Question Answering	MATH	Parameters (Billions)	7	MathCoder-CL-7B
Question Answering	MATH	Accuracy	29.9	MathCoder-L-13B
Question Answering	MATH	Parameters (Billions)	13	MathCoder-L-13B
Question Answering	MATH	Accuracy	23.3	MathCoder-L-7B
Question Answering	MATH	Parameters (Billions)	7	MathCoder-L-7B
Question Answering	SVAMP	Execution Accuracy	84.9	MathCoder-L-70B
Math Word Problem Solving	MATH	Accuracy	45.2	MathCoder-CL-34B
Math Word Problem Solving	MATH	Parameters (Billions)	34	MathCoder-CL-34B
Math Word Problem Solving	MATH	Accuracy	45.1	MathCoder-L-34B
Math Word Problem Solving	MATH	Parameters (Billions)	34	MathCoder-L-34B
Math Word Problem Solving	MATH	Accuracy	35.9	MathCoder-CL-13B
Math Word Problem Solving	MATH	Parameters (Billions)	13	MathCoder-CL-13B
Math Word Problem Solving	MATH	Accuracy	30.2	MathCoder-CL-7B
Math Word Problem Solving	MATH	Parameters (Billions)	7	MathCoder-CL-7B
Math Word Problem Solving	MATH	Accuracy	29.9	MathCoder-L-13B
Math Word Problem Solving	MATH	Parameters (Billions)	13	MathCoder-L-13B
Math Word Problem Solving	MATH	Accuracy	23.3	MathCoder-L-7B
Math Word Problem Solving	MATH	Parameters (Billions)	7	MathCoder-L-7B
Math Word Problem Solving	SVAMP	Execution Accuracy	84.9	MathCoder-L-70B
Mathematical Question Answering	MATH	Accuracy	45.2	MathCoder-CL-34B
Mathematical Question Answering	MATH	Parameters (Billions)	34	MathCoder-CL-34B
Mathematical Question Answering	MATH	Accuracy	45.1	MathCoder-L-34B
Mathematical Question Answering	MATH	Parameters (Billions)	34	MathCoder-L-34B
Mathematical Question Answering	MATH	Accuracy	35.9	MathCoder-CL-13B
Mathematical Question Answering	MATH	Parameters (Billions)	13	MathCoder-CL-13B
Mathematical Question Answering	MATH	Accuracy	30.2	MathCoder-CL-7B
Mathematical Question Answering	MATH	Parameters (Billions)	7	MathCoder-CL-7B
Mathematical Question Answering	MATH	Accuracy	29.9	MathCoder-L-13B
Mathematical Question Answering	MATH	Parameters (Billions)	13	MathCoder-L-13B
Mathematical Question Answering	MATH	Accuracy	23.3	MathCoder-L-7B
Mathematical Question Answering	MATH	Parameters (Billions)	7	MathCoder-L-7B
Mathematical Question Answering	SVAMP	Execution Accuracy	84.9	MathCoder-L-70B
Mathematical Reasoning	MATH	Accuracy	45.2	MathCoder-CL-34B
Mathematical Reasoning	MATH	Parameters (Billions)	34	MathCoder-CL-34B
Mathematical Reasoning	MATH	Accuracy	45.1	MathCoder-L-34B
Mathematical Reasoning	MATH	Parameters (Billions)	34	MathCoder-L-34B
Mathematical Reasoning	MATH	Accuracy	35.9	MathCoder-CL-13B
Mathematical Reasoning	MATH	Parameters (Billions)	13	MathCoder-CL-13B
Mathematical Reasoning	MATH	Accuracy	30.2	MathCoder-CL-7B
Mathematical Reasoning	MATH	Parameters (Billions)	7	MathCoder-CL-7B
Mathematical Reasoning	MATH	Accuracy	29.9	MathCoder-L-13B
Mathematical Reasoning	MATH	Parameters (Billions)	13	MathCoder-L-13B
Mathematical Reasoning	MATH	Accuracy	23.3	MathCoder-L-7B
Mathematical Reasoning	MATH	Parameters (Billions)	7	MathCoder-L-7B
Mathematical Reasoning	SVAMP	Execution Accuracy	84.9	MathCoder-L-70B
Arithmetic Reasoning	GSM8K	Accuracy	83.9	MathCoder-L-70B
Arithmetic Reasoning	GSM8K	Parameters (Billion)	70	MathCoder-L-70B
Arithmetic Reasoning	GSM8K	Accuracy	81.7	MathCoder-CL-34B
Arithmetic Reasoning	GSM8K	Parameters (Billion)	34	MathCoder-CL-34B
Arithmetic Reasoning	GSM8K	Accuracy	74.1	MathCoder-CL-13B
Arithmetic Reasoning	GSM8K	Parameters (Billion)	7	MathCoder-CL-13B
Arithmetic Reasoning	GSM8K	Accuracy	72.6	MathCoder-L-13B
Arithmetic Reasoning	GSM8K	Parameters (Billion)	13	MathCoder-L-13B
Arithmetic Reasoning	GSM8K	Accuracy	67.8	MathCoder-CL-7B
Arithmetic Reasoning	GSM8K	Parameters (Billion)	7	MathCoder-CL-7B
Arithmetic Reasoning	GSM8K	Accuracy	64.2	MathCoder-L-7B
Arithmetic Reasoning	GSM8K	Parameters (Billion)	7	MathCoder-L-7B

Abstract

Results

Task	Dataset	Metric	Value	Model
Question Answering	MATH	Accuracy	45.2	MathCoder-CL-34B
Question Answering	MATH	Parameters (Billions)	34	MathCoder-CL-34B
Question Answering	MATH	Accuracy	45.1	MathCoder-L-34B
Question Answering	MATH	Parameters (Billions)	34	MathCoder-L-34B
Question Answering	MATH	Accuracy	35.9	MathCoder-CL-13B
Question Answering	MATH	Parameters (Billions)	13	MathCoder-CL-13B
Question Answering	MATH	Accuracy	30.2	MathCoder-CL-7B
Question Answering	MATH	Parameters (Billions)	7	MathCoder-CL-7B
Question Answering	MATH	Accuracy	29.9	MathCoder-L-13B
Question Answering	MATH	Parameters (Billions)	13	MathCoder-L-13B
Question Answering	MATH	Accuracy	23.3	MathCoder-L-7B
Question Answering	MATH	Parameters (Billions)	7	MathCoder-L-7B
Question Answering	SVAMP	Execution Accuracy	84.9	MathCoder-L-70B
Math Word Problem Solving	MATH	Accuracy	45.2	MathCoder-CL-34B
Math Word Problem Solving	MATH	Parameters (Billions)	34	MathCoder-CL-34B
Math Word Problem Solving	MATH	Accuracy	45.1	MathCoder-L-34B
Math Word Problem Solving	MATH	Parameters (Billions)	34	MathCoder-L-34B
Math Word Problem Solving	MATH	Accuracy	35.9	MathCoder-CL-13B
Math Word Problem Solving	MATH	Parameters (Billions)	13	MathCoder-CL-13B
Math Word Problem Solving	MATH	Accuracy	30.2	MathCoder-CL-7B
Math Word Problem Solving	MATH	Parameters (Billions)	7	MathCoder-CL-7B
Math Word Problem Solving	MATH	Accuracy	29.9	MathCoder-L-13B
Math Word Problem Solving	MATH	Parameters (Billions)	13	MathCoder-L-13B
Math Word Problem Solving	MATH	Accuracy	23.3	MathCoder-L-7B
Math Word Problem Solving	MATH	Parameters (Billions)	7	MathCoder-L-7B
Math Word Problem Solving	SVAMP	Execution Accuracy	84.9	MathCoder-L-70B
Mathematical Question Answering	MATH	Accuracy	45.2	MathCoder-CL-34B
Mathematical Question Answering	MATH	Parameters (Billions)	34	MathCoder-CL-34B
Mathematical Question Answering	MATH	Accuracy	45.1	MathCoder-L-34B
Mathematical Question Answering	MATH	Parameters (Billions)	34	MathCoder-L-34B
Mathematical Question Answering	MATH	Accuracy	35.9	MathCoder-CL-13B
Mathematical Question Answering	MATH	Parameters (Billions)	13	MathCoder-CL-13B
Mathematical Question Answering	MATH	Accuracy	30.2	MathCoder-CL-7B
Mathematical Question Answering	MATH	Parameters (Billions)	7	MathCoder-CL-7B
Mathematical Question Answering	MATH	Accuracy	29.9	MathCoder-L-13B
Mathematical Question Answering	MATH	Parameters (Billions)	13	MathCoder-L-13B
Mathematical Question Answering	MATH	Accuracy	23.3	MathCoder-L-7B
Mathematical Question Answering	MATH	Parameters (Billions)	7	MathCoder-L-7B
Mathematical Question Answering	SVAMP	Execution Accuracy	84.9	MathCoder-L-70B
Mathematical Reasoning	MATH	Accuracy	45.2	MathCoder-CL-34B
Mathematical Reasoning	MATH	Parameters (Billions)	34	MathCoder-CL-34B
Mathematical Reasoning	MATH	Accuracy	45.1	MathCoder-L-34B
Mathematical Reasoning	MATH	Parameters (Billions)	34	MathCoder-L-34B
Mathematical Reasoning	MATH	Accuracy	35.9	MathCoder-CL-13B
Mathematical Reasoning	MATH	Parameters (Billions)	13	MathCoder-CL-13B
Mathematical Reasoning	MATH	Accuracy	30.2	MathCoder-CL-7B
Mathematical Reasoning	MATH	Parameters (Billions)	7	MathCoder-CL-7B
Mathematical Reasoning	MATH	Accuracy	29.9	MathCoder-L-13B
Mathematical Reasoning	MATH	Parameters (Billions)	13	MathCoder-L-13B
Mathematical Reasoning	MATH	Accuracy	23.3	MathCoder-L-7B
Mathematical Reasoning	MATH	Parameters (Billions)	7	MathCoder-L-7B
Mathematical Reasoning	SVAMP	Execution Accuracy	84.9	MathCoder-L-70B
Arithmetic Reasoning	GSM8K	Accuracy	83.9	MathCoder-L-70B
Arithmetic Reasoning	GSM8K	Parameters (Billion)	70	MathCoder-L-70B
Arithmetic Reasoning	GSM8K	Accuracy	81.7	MathCoder-CL-34B
Arithmetic Reasoning	GSM8K	Parameters (Billion)	34	MathCoder-CL-34B
Arithmetic Reasoning	GSM8K	Accuracy	74.1	MathCoder-CL-13B
Arithmetic Reasoning	GSM8K	Parameters (Billion)	7	MathCoder-CL-13B
Arithmetic Reasoning	GSM8K	Accuracy	72.6	MathCoder-L-13B
Arithmetic Reasoning	GSM8K	Parameters (Billion)	13	MathCoder-L-13B
Arithmetic Reasoning	GSM8K	Accuracy	67.8	MathCoder-CL-7B
Arithmetic Reasoning	GSM8K	Parameters (Billion)	7	MathCoder-CL-7B
Arithmetic Reasoning	GSM8K	Accuracy	64.2	MathCoder-L-7B
Arithmetic Reasoning	GSM8K	Parameters (Billion)	7	MathCoder-L-7B

MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning

Abstract

Results

Related Papers

MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning

Abstract

Results

Related Papers