Math Word Problem Solving on MATH

Metric: Parameters (Billions) (higher is better)

LeaderboardDataset

Loading chart...

Results

Hide extra data

Sort:

#	Model↕	Parameters (Billions)▼	Extra Data	Paper	Date↕	Code
1	Minerva 540B	540	No	Solving Quantitative Reasoning Problems with Lan...	2022-06-29	Code
2	Minerva 540B (5-shot) mCoT	540	No	Galactica: A Large Language Model for Science	2022-11-16	Code
3	PaLM 540B	540	No	Solving Quantitative Reasoning Problems with Lan...	2022-06-29	Code
4	PaLM 540B (5-shot) mCoT	540	No	Galactica: A Large Language Model for Science	2022-11-16	Code
5	davinci-002 175B	175	No	Solving Quantitative Reasoning Problems with Lan...	2022-06-29	Code
6	GPT-3-175B (few-shot)	175	No	Measuring Mathematical Problem Solving With the ...	2021-03-05	Code
7	GPT-3 175B (8-shot)	175	No	Galactica: A Large Language Model for Science	2022-11-16	Code
8	GAL 120B (5-shot) mCoT	120	No	Galactica: A Large Language Model for Science	2022-11-16	Code
9	GAL 120B <work>	120	No	Galactica: A Large Language Model for Science	2022-11-16	Code
10	Qwen2.5-Math-72B-Instruct(TIR,Greedy)	72	Yes	Qwen2.5-Math Technical Report: Toward Mathematic...	2024-09-18	-
11	Qwen2.5-Math-72B-Instruct(COT,Greedy)	72	Yes	Qwen2.5-Math Technical Report: Toward Mathematic...	2024-09-18	-
12	Qwen2-Math-72B-Instruct(greedy)	72	Yes	Qwen2 Technical Report	2024-07-15	Code
13	MMIQC-72B	72	Yes	Augmenting Math Word Problems via Iterative Ques...	2024-01-17	Code
14	OpenMath-CodeLlama-70B (w/ code, SC, k=50)	70	Yes	OpenMathInstruct-1: A 1.8 Million Math Instructi...	2024-02-15	Code
15	OpenMath-Llama2-70B (w/ code, SC, k=50)	70	Yes	OpenMathInstruct-1: A 1.8 Million Math Instructi...	2024-02-15	Code
16	ToRA 70B (w/ code, SC, k=50)	70	Yes	ToRA: A Tool-Integrated Reasoning Agent for Math...	2023-09-29	Code
17	DART-Math-Llama3-70B-Prop2Diff (0-shot CoT, w/o code)	70	Yes	DART-Math: Difficulty-Aware Rejection Tuning for...	2024-06-18	Code
18	DART-Math-Llama3-70B-Uniform (0-shot CoT, w/o code)	70	Yes	DART-Math: Difficulty-Aware Rejection Tuning for...	2024-06-18	Code
19	OpenMath-CodeLlama-70B (w/ code)	70	Yes	OpenMathInstruct-1: A 1.8 Million Math Instructi...	2024-02-15	Code
20	ToRA 70B (w/ code)	70	Yes	ToRA: A Tool-Integrated Reasoning Agent for Math...	2023-09-29	Code
21	OpenMath-Llama2-70B (w/ code)	70	Yes	OpenMathInstruct-1: A 1.8 Million Math Instructi...	2024-02-15	Code
22	MuggleMATH-70B	70	Yes	MuggleMath: Assessing the Impact of Query and Re...	2023-10-09	Code
23	MetaMath 70B	70	Yes	MetaMath: Bootstrap Your Own Mathematical Questi...	2023-09-21	Code
24	WizardMath-70B-V1.0	70	Yes	WizardMath: Empowering Mathematical Reasoning fo...	2023-08-18	Code
25	Shepherd + DeepSeek-67B (SFT on MetaMATH + PRM rerank, k=256)	67	Yes	Math-Shepherd: Verify and Reinforce LLMs Step-by...	2023-12-14	Code
26	LLaMA 65B (maj1@k)	65	No	LLaMA: Open and Efficient Foundation Language Mo...	2023-02-27	Code
27	LLaMA 65B	65	No	LLaMA: Open and Efficient Foundation Language Mo...	2023-02-27	Code
28	Minerva 62B (maj5@256)	62	No	Solving Quantitative Reasoning Problems with Lan...	2022-06-29	Code
29	Minerva 62B (maj1@k, k=64)	62	No	Solving Quantitative Reasoning Problems with Lan...	2022-06-29	Code
30	Minerva 62B (4-shot)	62	No	Solving Quantitative Reasoning Problems with Lan...	2022-06-29	Code
31	PaLM 62B	62	No	Solving Quantitative Reasoning Problems with Lan...	2022-06-29	Code
32	OpenMath-CodeLlama-34B (w/ code, SC, k=50)	34	Yes	OpenMathInstruct-1: A 1.8 Million Math Instructi...	2024-02-15	Code
33	ToRA-Code 34B model (w/ code, SC, k=50)	34	Yes	ToRA: A Tool-Integrated Reasoning Agent for Math...	2023-09-29	Code
34	ToRA-Code 34B (w/ code)	34	Yes	ToRA: A Tool-Integrated Reasoning Agent for Math...	2023-09-29	Code
35	MMOS-CODE-34B(0-shot)	34	Yes	An Empirical Study of Data Ability Boundary in L...	2024-02-23	Code
36	Llemma-34B-KPMath-Plus	34	No	Key-Point-Driven Data Synthesis with its Enhance...	2024-03-04	-
37	OpenMath-CodeLlama-34B (w/ code)	34	Yes	OpenMathInstruct-1: A 1.8 Million Math Instructi...	2024-02-15	Code
38	MathCoder-CL-34B	34	Yes	MathCoder: Seamless Code Integration in LLMs for...	2023-10-05	Code
39	MathCoder-L-34B	34	Yes	MathCoder: Seamless Code Integration in LLMs for...	2023-10-05	Code
40	LLaMA 33B-maj1@k	33	No	LLaMA: Open and Efficient Foundation Language Mo...	2023-02-27	Code
41	LLaMA 33B	33	No	LLaMA: Open and Efficient Foundation Language Mo...	2023-02-27	Code
42	GAL 30B (5-shot) mCoT	30	No	Galactica: A Large Language Model for Science	2022-11-16	Code
43	GAL 30B <work>	30	No	Galactica: A Large Language Model for Science	2022-11-16	Code
44	OpenMath-CodeLlama-13B (w/ code, SC, k=50)	13	Yes	OpenMathInstruct-1: A 1.8 Million Math Instructi...	2024-02-15	Code
45	ToRA-Code 13B (w/ code)	13	Yes	ToRA: A Tool-Integrated Reasoning Agent for Math...	2023-09-29	Code
46	OpenMath-CodeLlama-13B (w/ code)	13	Yes	OpenMathInstruct-1: A 1.8 Million Math Instructi...	2024-02-15	Code
47	ToRA 13B (w/ code)	13	Yes	ToRA: A Tool-Integrated Reasoning Agent for Math...	2023-09-29	Code
48	Llama2-13B-KPMath-Plus	13	No	Key-Point-Driven Data Synthesis with its Enhance...	2024-03-04	-
49	MathCoder-CL-13B	13	Yes	MathCoder: Seamless Code Integration in LLMs for...	2023-10-05	Code
50	MuggleMATH-13B	13	Yes	MuggleMath: Assessing the Impact of Query and Re...	2023-10-09	Code
51	MathCoder-L-13B	13	Yes	MathCoder: Seamless Code Integration in LLMs for...	2023-10-05	Code
52	MetaMath 13B	13	Yes	MetaMath: Bootstrap Your Own Mathematical Questi...	2023-09-21	Code
53	WizardMath-13B-V1.0	13	Yes	WizardMath: Empowering Mathematical Reasoning fo...	2023-08-18	Code
54	LLaMA 13B-maj1@k	13	No	LLaMA: Open and Efficient Foundation Language Mo...	2023-02-27	Code
55	GPT-3 13B	13	No	Measuring Mathematical Problem Solving With the ...	2021-03-05	Code
56	LLaMA 13B	13	No	LLaMA: Open and Efficient Foundation Language Mo...	2023-02-27	Code
57	GPT-3-13B (few-shot)	13	No	Measuring Mathematical Problem Solving With the ...	2021-03-05	Code
58	Minerva 8B (maj5@256)	8	No	Solving Quantitative Reasoning Problems with Lan...	2022-06-29	Code
59	DART-Math-Llama3-8B-Prop2Diff (0-shot CoT, w/o code)	8	Yes	DART-Math: Difficulty-Aware Rejection Tuning for...	2024-06-18	Code
60	DART-Math-Llama3-8B-Uniform (0-shot CoT, w/o code)	8	Yes	DART-Math: Difficulty-Aware Rejection Tuning for...	2024-06-18	Code
61	Minerva 8B (maj1@k, k=64)	8	No	Solving Quantitative Reasoning Problems with Lan...	2022-06-29	Code
62	Minerva 8B	8	No	Solving Quantitative Reasoning Problems with Lan...	2022-06-29	Code
63	PaLM 8B (fine-tuned)	8	No	Solving Quantitative Reasoning Problems with Lan...	2022-06-29	Code
64	PaLM 8B	8	No	Solving Quantitative Reasoning Problems with Lan...	2022-06-29	Code
65	Qwen2.5-Math-7B-Instruct(TIR,Greedy)	7	Yes	Qwen2.5-Math Technical Report: Toward Mathematic...	2024-09-18	-
66	Qwen2.5-Math-7B-Instruct(COT,Greedy)	7	Yes	Qwen2.5-Math Technical Report: Toward Mathematic...	2024-09-18	-
67	DAMOMath-7B	7	Yes	-	-	-
68	MMOS-DeepSeekMath-7B(0-shot,k=50)	7	Yes	An Empirical Study of Data Ability Boundary in L...	2024-02-23	Code
69	DeepSeekMATH-RL-7B (w/ code, greedy decoding)	7	Yes	DeepSeekMath: Pushing the Limits of Mathematical...	2024-02-05	Code
70	OpenMath-Mistral-7B (w/ code, SC, k=50)	7	Yes	OpenMathInstruct-1: A 1.8 Million Math Instructi...	2024-02-15	Code
71	OpenMath-CodeLlama-7B (w/ code, SC, k=50)	7	Yes	OpenMathInstruct-1: A 1.8 Million Math Instructi...	2024-02-15	Code
72	MMOS-DeepSeekMath-7B(0-shot)	7	Yes	An Empirical Study of Data Ability Boundary in L...	2024-02-23	Code
73	DART-Math-DSMath-7B-Prop2Diff (0-shot CoT, w/o code)	7	Yes	DART-Math: Difficulty-Aware Rejection Tuning for...	2024-06-18	Code
74	DART-Math-DSMath-7B-Uniform (0-shot CoT, w/o code)	7	Yes	DART-Math: Difficulty-Aware Rejection Tuning for...	2024-06-18	Code
75	DeepSeekMATH-RL-7B (greedy decoding)	7	Yes	DeepSeekMath: Pushing the Limits of Mathematical...	2024-02-05	Code
76	DeepSeekMath-7B-KPMath-Plus	7	No	Key-Point-Driven Data Synthesis with its Enhance...	2024-03-04	-
77	Mistral-7B-KPMath-Plus	7	Yes	Key-Point-Driven Data Synthesis with its Enhance...	2024-03-04	-
78	DART-Math-Mistral-7B-Prop2Diff (0-shot CoT, w/o code)	7	Yes	DART-Math: Difficulty-Aware Rejection Tuning for...	2024-06-18	Code
79	ToRA-Code 7B (w/ code)	7	Yes	ToRA: A Tool-Integrated Reasoning Agent for Math...	2023-09-29	Code
80	OpenMath-Mistral-7B (w/ code)	7	Yes	OpenMathInstruct-1: A 1.8 Million Math Instructi...	2024-02-15	Code
81	MMOS-CODE-7B(0-shot)	7	Yes	An Empirical Study of Data Ability Boundary in L...	2024-02-23	Code
82	OpenMath-CodeLlama-7B (w/ code)	7	Yes	OpenMathInstruct-1: A 1.8 Million Math Instructi...	2024-02-15	Code
83	Shepherd+Mistral-7B (SFT on MetaMATH + PRM RL+ PRM rerank, k=256)	7	Yes	Math-Shepherd: Verify and Reinforce LLMs Step-by...	2023-12-14	Code
84	DART-Math-Mistral-7B-Uniform (0-shot CoT, w/o code)	7	Yes	DART-Math: Difficulty-Aware Rejection Tuning for...	2024-06-18	Code
85	SFT-Mistral-7B	7	Yes	-	-	-
86	ToRA 7B (w/ code)	7	Yes	ToRA: A Tool-Integrated Reasoning Agent for Math...	2023-09-29	Code
87	Shepherd + Mistral-7B (SFT on MetaMATH + PRM RL)	7	Yes	Math-Shepherd: Verify and Reinforce LLMs Step-by...	2023-12-14	Code
88	WizardMath-7B-V1.1	7	Yes	WizardMath: Empowering Mathematical Reasoning fo...	2023-08-18	Code
89	MathCoder-CL-7B	7	Yes	MathCoder: Seamless Code Integration in LLMs for...	2023-10-05	Code
90	OpenChat-3.5-1210 7B	7	No	OpenChat: Advancing Open-source Language Models ...	2023-09-20	Code
91	OpenChat-3.5 7B	7	No	OpenChat: Advancing Open-source Language Models ...	2023-09-20	Code
92	MuggleMATH 7B	7	Yes	MuggleMath: Assessing the Impact of Query and Re...	2023-10-09	Code
93	MathCoder-L-7B	7	Yes	MathCoder: Seamless Code Integration in LLMs for...	2023-10-05	Code
94	MetaMath 7B	7	Yes	MetaMath: Bootstrap Your Own Mathematical Questi...	2023-09-21	Code
95	Mistral 7B (maj@4)	7	No	Mixtral of Experts	2024-01-08	Code
96	Mistral 7B (maj@4)	7	No	Mixtral of Experts	2024-01-08	Code
97	WizardMath-7B-V1.0	7	Yes	WizardMath: Empowering Mathematical Reasoning fo...	2023-08-18	Code
98	LLaMA 7B-maj1@k	7	No	LLaMA: Open and Efficient Foundation Language Mo...	2023-02-27	Code
99	LLaMA 7B	7	No	LLaMA: Open and Efficient Foundation Language Mo...	2023-02-27	Code
100	GPT-3 2.7B	2.7	No	Measuring Mathematical Problem Solving With the ...	2021-03-05	Code
101	Qwen2.5-Math-1.5B-Instruct(TIR,Greedy)	1.5	Yes	Qwen2.5-Math Technical Report: Toward Mathematic...	2024-09-18	-
102	Qwen2.5-Math-1.5B-Instruct(COT,Greedy)	1.5	Yes	Qwen2.5-Math Technical Report: Toward Mathematic...	2024-09-18	-
103	GPT-2 (1.5B)	1.5	No	Measuring Mathematical Problem Solving With the ...	2021-03-05	Code
104	GPT-2 (0.7B)	0.7	No	Measuring Mathematical Problem Solving With the ...	2021-03-05	Code
105	GPT-2 (0.3B)	0.3	No	Measuring Mathematical Problem Solving With the ...	2021-03-05	Code
106	GPT-2 (0.1B)	0.1	No	Measuring Mathematical Problem Solving With the ...	2021-03-05	Code

#1Minerva 540BSOTA
540
Parameters (Billions)· 2022-06-29
Solving Quantitative Reasoning Problems with Language Models Code
#2Minerva 540B (5-shot) mCoT
540
Parameters (Billions)· 2022-11-16
Galactica: A Large Language Model for Science Code
#3PaLM 540B
540
Parameters (Billions)· 2022-06-29
Solving Quantitative Reasoning Problems with Language Models Code
#4PaLM 540B (5-shot) mCoT
540
Parameters (Billions)· 2022-11-16
Galactica: A Large Language Model for Science Code
#5davinci-002 175B
175
Parameters (Billions)· 2022-06-29
Solving Quantitative Reasoning Problems with Language Models Code
#6GPT-3-175B (few-shot)SOTA
175
Parameters (Billions)· 2021-03-05
Measuring Mathematical Problem Solving With the MATH Dataset Code
#7GPT-3 175B (8-shot)
175
Parameters (Billions)· 2022-11-16
Galactica: A Large Language Model for Science Code
#8GAL 120B (5-shot) mCoT
120
Parameters (Billions)· 2022-11-16
Galactica: A Large Language Model for Science Code
#9GAL 120B <work>
120
Parameters (Billions)· 2022-11-16
Galactica: A Large Language Model for Science Code
#10Qwen2.5-Math-72B-Instruct(TIR,Greedy)
72
Parameters (Billions)· Extra Data· 2024-09-18
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
#11Qwen2.5-Math-72B-Instruct(COT,Greedy)
72
Parameters (Billions)· Extra Data· 2024-09-18
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
#12Qwen2-Math-72B-Instruct(greedy)
72
Parameters (Billions)· Extra Data· 2024-07-15
Qwen2 Technical Report Code
#13MMIQC-72B
72
Parameters (Billions)· Extra Data· 2024-01-17
Augmenting Math Word Problems via Iterative Question Composing Code
#14OpenMath-CodeLlama-70B (w/ code, SC, k=50)
70
Parameters (Billions)· Extra Data· 2024-02-15
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset Code
#15OpenMath-Llama2-70B (w/ code, SC, k=50)
70
Parameters (Billions)· Extra Data· 2024-02-15
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset Code
#16ToRA 70B (w/ code, SC, k=50)
70
Parameters (Billions)· Extra Data· 2023-09-29
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving Code
#17DART-Math-Llama3-70B-Prop2Diff (0-shot CoT, w/o code)
70
Parameters (Billions)· Extra Data· 2024-06-18
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving Code
#18DART-Math-Llama3-70B-Uniform (0-shot CoT, w/o code)
70
Parameters (Billions)· Extra Data· 2024-06-18
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving Code
#19OpenMath-CodeLlama-70B (w/ code)
70
Parameters (Billions)· Extra Data· 2024-02-15
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset Code
#20ToRA 70B (w/ code)
70
Parameters (Billions)· Extra Data· 2023-09-29
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving Code
#21OpenMath-Llama2-70B (w/ code)
70
Parameters (Billions)· Extra Data· 2024-02-15
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset Code
#22MuggleMATH-70B
70
Parameters (Billions)· Extra Data· 2023-10-09
MuggleMath: Assessing the Impact of Query and Response Augmentation on Math Reasoning Code
#23MetaMath 70B
70
Parameters (Billions)· Extra Data· 2023-09-21
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models Code
#24WizardMath-70B-V1.0
70
Parameters (Billions)· Extra Data· 2023-08-18
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct Code
#25Shepherd + DeepSeek-67B (SFT on MetaMATH + PRM rerank, k=256)
67
Parameters (Billions)· Extra Data· 2023-12-14
Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations Code
#26LLaMA 65B (maj1@k)
65
Parameters (Billions)· 2023-02-27
LLaMA: Open and Efficient Foundation Language Models Code
#27LLaMA 65B
65
Parameters (Billions)· 2023-02-27
LLaMA: Open and Efficient Foundation Language Models Code
#28Minerva 62B (maj5@256)
62
Parameters (Billions)· 2022-06-29
Solving Quantitative Reasoning Problems with Language Models Code
#29Minerva 62B (maj1@k, k=64)
62
Parameters (Billions)· 2022-06-29
Solving Quantitative Reasoning Problems with Language Models Code
#30Minerva 62B (4-shot)
62
Parameters (Billions)· 2022-06-29
Solving Quantitative Reasoning Problems with Language Models Code
#31PaLM 62B
62
Parameters (Billions)· 2022-06-29
Solving Quantitative Reasoning Problems with Language Models Code
#32OpenMath-CodeLlama-34B (w/ code, SC, k=50)
34
Parameters (Billions)· Extra Data· 2024-02-15
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset Code
#33ToRA-Code 34B model (w/ code, SC, k=50)
34
Parameters (Billions)· Extra Data· 2023-09-29
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving Code
#34ToRA-Code 34B (w/ code)
34
Parameters (Billions)· Extra Data· 2023-09-29
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving Code
#35MMOS-CODE-34B(0-shot)
34
Parameters (Billions)· Extra Data· 2024-02-23
An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning Code
#36Llemma-34B-KPMath-Plus
34
Parameters (Billions)· 2024-03-04
Key-Point-Driven Data Synthesis with its Enhancement on Mathematical Reasoning
#37OpenMath-CodeLlama-34B (w/ code)
34
Parameters (Billions)· Extra Data· 2024-02-15
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset Code
#38MathCoder-CL-34B
34
Parameters (Billions)· Extra Data· 2023-10-05
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning Code
#39MathCoder-L-34B
34
Parameters (Billions)· Extra Data· 2023-10-05
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning Code
#40LLaMA 33B-maj1@k
33
Parameters (Billions)· 2023-02-27
LLaMA: Open and Efficient Foundation Language Models Code
#41LLaMA 33B
33
Parameters (Billions)· 2023-02-27
LLaMA: Open and Efficient Foundation Language Models Code
#42GAL 30B (5-shot) mCoT
30
Parameters (Billions)· 2022-11-16
Galactica: A Large Language Model for Science Code
#43GAL 30B <work>
30
Parameters (Billions)· 2022-11-16
Galactica: A Large Language Model for Science Code
#44OpenMath-CodeLlama-13B (w/ code, SC, k=50)
13
Parameters (Billions)· Extra Data· 2024-02-15
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset Code
#45ToRA-Code 13B (w/ code)
13
Parameters (Billions)· Extra Data· 2023-09-29
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving Code
#46OpenMath-CodeLlama-13B (w/ code)
13
Parameters (Billions)· Extra Data· 2024-02-15
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset Code
#47ToRA 13B (w/ code)
13
Parameters (Billions)· Extra Data· 2023-09-29
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving Code
#48Llama2-13B-KPMath-Plus
13
Parameters (Billions)· 2024-03-04
Key-Point-Driven Data Synthesis with its Enhancement on Mathematical Reasoning
#49MathCoder-CL-13B
13
Parameters (Billions)· Extra Data· 2023-10-05
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning Code
#50MuggleMATH-13B
13
Parameters (Billions)· Extra Data· 2023-10-09
MuggleMath: Assessing the Impact of Query and Response Augmentation on Math Reasoning Code
#51MathCoder-L-13B
13
Parameters (Billions)· Extra Data· 2023-10-05
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning Code
#52MetaMath 13B
13
Parameters (Billions)· Extra Data· 2023-09-21
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models Code
#53WizardMath-13B-V1.0
13
Parameters (Billions)· Extra Data· 2023-08-18
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct Code
#54LLaMA 13B-maj1@k
13
Parameters (Billions)· 2023-02-27
LLaMA: Open and Efficient Foundation Language Models Code
#55GPT-3 13B
13
Parameters (Billions)· 2021-03-05
Measuring Mathematical Problem Solving With the MATH Dataset Code
#56LLaMA 13B
13
Parameters (Billions)· 2023-02-27
LLaMA: Open and Efficient Foundation Language Models Code
#57GPT-3-13B (few-shot)
13
Parameters (Billions)· 2021-03-05
Measuring Mathematical Problem Solving With the MATH Dataset Code
#58Minerva 8B (maj5@256)
8
Parameters (Billions)· 2022-06-29
Solving Quantitative Reasoning Problems with Language Models Code
#59DART-Math-Llama3-8B-Prop2Diff (0-shot CoT, w/o code)
8
Parameters (Billions)· Extra Data· 2024-06-18
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving Code
#60DART-Math-Llama3-8B-Uniform (0-shot CoT, w/o code)
8
Parameters (Billions)· Extra Data· 2024-06-18
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving Code
#61Minerva 8B (maj1@k, k=64)
8
Parameters (Billions)· 2022-06-29
Solving Quantitative Reasoning Problems with Language Models Code
#62Minerva 8B
8
Parameters (Billions)· 2022-06-29
Solving Quantitative Reasoning Problems with Language Models Code
#63PaLM 8B (fine-tuned)
8
Parameters (Billions)· 2022-06-29
Solving Quantitative Reasoning Problems with Language Models Code
#64PaLM 8B
8
Parameters (Billions)· 2022-06-29
Solving Quantitative Reasoning Problems with Language Models Code
#65Qwen2.5-Math-7B-Instruct(TIR,Greedy)
7
Parameters (Billions)· Extra Data· 2024-09-18
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
#66Qwen2.5-Math-7B-Instruct(COT,Greedy)
7
Parameters (Billions)· Extra Data· 2024-09-18
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
#67DAMOMath-7B
7
Parameters (Billions)· Extra Data
No paper
#68MMOS-DeepSeekMath-7B(0-shot,k=50)
7
Parameters (Billions)· Extra Data· 2024-02-23
An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning Code
#69DeepSeekMATH-RL-7B (w/ code, greedy decoding)
7
Parameters (Billions)· Extra Data· 2024-02-05
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models Code
#70OpenMath-Mistral-7B (w/ code, SC, k=50)
7
Parameters (Billions)· Extra Data· 2024-02-15
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset Code
#71OpenMath-CodeLlama-7B (w/ code, SC, k=50)
7
Parameters (Billions)· Extra Data· 2024-02-15
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset Code
#72MMOS-DeepSeekMath-7B(0-shot)
7
Parameters (Billions)· Extra Data· 2024-02-23
An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning Code
#73DART-Math-DSMath-7B-Prop2Diff (0-shot CoT, w/o code)
7
Parameters (Billions)· Extra Data· 2024-06-18
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving Code
#74DART-Math-DSMath-7B-Uniform (0-shot CoT, w/o code)
7
Parameters (Billions)· Extra Data· 2024-06-18
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving Code
#75DeepSeekMATH-RL-7B (greedy decoding)
7
Parameters (Billions)· Extra Data· 2024-02-05
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models Code
#76DeepSeekMath-7B-KPMath-Plus
7
Parameters (Billions)· 2024-03-04
Key-Point-Driven Data Synthesis with its Enhancement on Mathematical Reasoning
#77Mistral-7B-KPMath-Plus
7
Parameters (Billions)· Extra Data· 2024-03-04
Key-Point-Driven Data Synthesis with its Enhancement on Mathematical Reasoning
#78DART-Math-Mistral-7B-Prop2Diff (0-shot CoT, w/o code)
7
Parameters (Billions)· Extra Data· 2024-06-18
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving Code
#79ToRA-Code 7B (w/ code)
7
Parameters (Billions)· Extra Data· 2023-09-29
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving Code
#80OpenMath-Mistral-7B (w/ code)
7
Parameters (Billions)· Extra Data· 2024-02-15
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset Code
#81MMOS-CODE-7B(0-shot)
7
Parameters (Billions)· Extra Data· 2024-02-23
An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning Code
#82OpenMath-CodeLlama-7B (w/ code)
7
Parameters (Billions)· Extra Data· 2024-02-15
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset Code
#83Shepherd+Mistral-7B (SFT on MetaMATH + PRM RL+ PRM rerank, k=256)
7
Parameters (Billions)· Extra Data· 2023-12-14
Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations Code
#84DART-Math-Mistral-7B-Uniform (0-shot CoT, w/o code)
7
Parameters (Billions)· Extra Data· 2024-06-18
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving Code
#85SFT-Mistral-7B
7
Parameters (Billions)· Extra Data
No paper
#86ToRA 7B (w/ code)
7
Parameters (Billions)· Extra Data· 2023-09-29
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving Code
#87Shepherd + Mistral-7B (SFT on MetaMATH + PRM RL)
7
Parameters (Billions)· Extra Data· 2023-12-14
Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations Code
#88WizardMath-7B-V1.1
7
Parameters (Billions)· Extra Data· 2023-08-18
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct Code
#89MathCoder-CL-7B
7
Parameters (Billions)· Extra Data· 2023-10-05
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning Code
#90OpenChat-3.5-1210 7B
7
Parameters (Billions)· 2023-09-20
OpenChat: Advancing Open-source Language Models with Mixed-Quality Data Code
#91OpenChat-3.5 7B
7
Parameters (Billions)· 2023-09-20
OpenChat: Advancing Open-source Language Models with Mixed-Quality Data Code
#92MuggleMATH 7B
7
Parameters (Billions)· Extra Data· 2023-10-09
MuggleMath: Assessing the Impact of Query and Response Augmentation on Math Reasoning Code
#93MathCoder-L-7B
7
Parameters (Billions)· Extra Data· 2023-10-05
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning Code
#94MetaMath 7B
7
Parameters (Billions)· Extra Data· 2023-09-21
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models Code
#95Mistral 7B (maj@4)
7
Parameters (Billions)· 2024-01-08
Mixtral of Experts Code
#96Mistral 7B (maj@4)
7
Parameters (Billions)· 2024-01-08
Mixtral of Experts Code
#97WizardMath-7B-V1.0
7
Parameters (Billions)· Extra Data· 2023-08-18
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct Code
#98LLaMA 7B-maj1@k
7
Parameters (Billions)· 2023-02-27
LLaMA: Open and Efficient Foundation Language Models Code
#99LLaMA 7B
7
Parameters (Billions)· 2023-02-27
LLaMA: Open and Efficient Foundation Language Models Code
#100GPT-3 2.7B
2.7
Parameters (Billions)· 2021-03-05
Measuring Mathematical Problem Solving With the MATH Dataset Code
#101Qwen2.5-Math-1.5B-Instruct(TIR,Greedy)
1.5
Parameters (Billions)· Extra Data· 2024-09-18
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
#102Qwen2.5-Math-1.5B-Instruct(COT,Greedy)
1.5
Parameters (Billions)· Extra Data· 2024-09-18
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
#103GPT-2 (1.5B)
1.5
Parameters (Billions)· 2021-03-05
Measuring Mathematical Problem Solving With the MATH Dataset Code
#104GPT-2 (0.7B)
0.7
Parameters (Billions)· 2021-03-05
Measuring Mathematical Problem Solving With the MATH Dataset Code
#105GPT-2 (0.3B)
0.3
Parameters (Billions)· 2021-03-05
Measuring Mathematical Problem Solving With the MATH Dataset Code
#106GPT-2 (0.1B)
0.1
Parameters (Billions)· 2021-03-05
Measuring Mathematical Problem Solving With the MATH Dataset Code