Arithmetic Reasoning on GSM8K

Metric: Parameters (Billion) (higher is better)

LeaderboardDataset

Loading chart...

Results

Hide extra data

Sort:

#	Model↕	Parameters (Billion)▼	Extra Data	Paper	Date↕	Code
1	PaLM 540B (Self Improvement, Self Consistency)	540	No	Large Language Models Can Self-Improve	2022-10-20	-
2	Minerva 540B (CoT)	540	No	Solving Quantitative Reasoning Problems with Lan...	2022-06-29	Code
3	PaLM 540B maj1@40 (8-shot)	540	Yes	Self-Consistency Improves Chain of Thought Reaso...	2022-03-21	Code
4	PaLM 540B (Self Consistency)	540	No	Large Language Models Can Self-Improve	2022-10-20	-
5	PaLM 540B (Self Improvement, CoT Prompting)	540	No	Large Language Models Can Self-Improve	2022-10-20	-
6	U-PaLM	540	No	Transcending Scaling Laws with 0.1% Extra Compute	2022-10-20	-
7	PaLM-540B (few-Shot-cot)	540	Yes	Large Language Models are Zero-Shot Reasoners	2022-05-24	Code
8	PaLM 540B (8-shot)	540	Yes	Solving Quantitative Reasoning Problems with Lan...	2022-06-29	Code
9	PaLM 540B (CoT Prompting)	540	No	Large Language Models Can Self-Improve	2022-10-20	-
10	PaLM 540B (Self Improvement, Standard-Prompting)	540	No	Large Language Models Can Self-Improve	2022-10-20	-
11	PaLM 540B (few-shot)	540	Yes	Large Language Models are Zero-Shot Reasoners	2022-05-24	Code
12	PaLM 540B (Standard-Prompting)	540	No	Large Language Models Can Self-Improve	2022-10-20	-
13	code-davinci-002 175B (LEVER, 8-shot)	175	No	LEVER: Learning to Verify Language-to-Code Gener...	2023-02-16	Code
14	DIVERSE 175B (8-shot)	175	No	Making Large Language Models Better Reasoners wi...	2022-06-06	-
15	code-davinci-002 (Least-to-Most Prompting)	175	No	Least-to-Most Prompting Enables Complex Reasonin...	2022-05-21	Code
16	Finetuned GPT-3 175B + verifier	175	Yes	Large Language Models are Zero-Shot Reasoners	2022-05-24	Code
17	Text-davinci-002-175B (zero-plus-few-Shot-cot (8 samples))	175	Yes	Large Language Models are Zero-Shot Reasoners	2022-05-24	Code
18	text-davinci-002 175B (2-shot, CoT)	175	Yes	Large Language Models are Zero-Shot Reasoners	2022-05-24	Code
19	text-davinci-002 175B (0-shot, CoT)	175	Yes	Large Language Models are Zero-Shot Reasoners	2022-05-24	Code
20	Text-davinci-002-175B (0-shot)	175	Yes	Large Language Models are Zero-Shot Reasoners	2022-05-24	Code
21	RFT 70B	79	Yes	Scaling Relationship on Learning Mathematical Re...	2023-08-03	Code
22	Jiutian-大模型	75	No	-	-	-
23	Qwen2-Math-72B-Instruct (greedy)	72	Yes	Qwen2 Technical Report	2024-07-15	Code
24	AlphaLLM (with MCTS)	70	No	Toward Self-Improvement of LLMs via Imagination,...	2024-04-18	Code
25	OpenMath-CodeLlama-70B (w/ code, SC, k=50)	70	Yes	OpenMathInstruct-1: A 1.8 Million Math Instructi...	2024-02-15	Code
26	DART-Math-Llama3-70B-Uniform (0-shot CoT, w/o code)	70	Yes	DART-Math: Difficulty-Aware Rejection Tuning for...	2024-06-18	Code
27	OpenMath-Llama2-70B (w/ code, SC, k=50)	70	Yes	OpenMathInstruct-1: A 1.8 Million Math Instructi...	2024-02-15	Code
28	DART-Math-Llama3-70B-Prop2Diff (0-shot CoT, w/o code)	70	Yes	DART-Math: Difficulty-Aware Rejection Tuning for...	2024-06-18	Code
29	ToRA-70B (SC, k=50)	70	Yes	ToRA: A Tool-Integrated Reasoning Agent for Math...	2023-09-29	Code
30	DeepMind 70B Model (SFT+ORM-RL, ORM reranking)	70	Yes	Solving math word problems with process- and out...	2022-11-25	-
31	DeepMind 70B Model (SFT+PRM-RL, PRM reranking)	70	Yes	Solving math word problems with process- and out...	2022-11-25	-
32	OpenMath-Llama2-70B (w/ code)	70	Yes	OpenMathInstruct-1: A 1.8 Million Math Instructi...	2024-02-15	Code
33	OpenMath-CodeLlama-70B (w/ code)	70	Yes	OpenMathInstruct-1: A 1.8 Million Math Instructi...	2024-02-15	Code
34	ToRA 70B	70	Yes	ToRA: A Tool-Integrated Reasoning Agent for Math...	2023-09-29	Code
35	MathCoder-L-70B	70	Yes	MathCoder: Seamless Code Integration in LLMs for...	2023-10-05	Code
36	MetaMath 70B	70	Yes	MetaMath: Bootstrap Your Own Mathematical Questi...	2023-09-21	Code
37	MuggleMATH 70B	70	Yes	MuggleMath: Assessing the Impact of Query and Re...	2023-10-09	Code
38	WizardMath-70B-V1.0	70	Yes	WizardMath: Empowering Mathematical Reasoning fo...	2023-08-18	Code
39	DeepMind 70B Model (STaR, maj1@96)	70	Yes	Solving math word problems with process- and out...	2022-11-25	-
40	Llama-2 70B (on 100 first questions, 4-shot, auto-optimized prompting)	70	No	The Unreasonable Effectiveness of Eccentric Auto...	2024-02-09	-
41	LLaMA 2 70B (CoT-Influx)	70	No	Fewer is More: Boosting LLM Reasoning with Reinf...	2023-12-14	-
42	LLaMA 2 70B (on-shot)	70	No	Llama 2: Open Foundation and Fine-Tuned Chat Mod...	2023-07-18	Code
43	LLaMA 65B-maj1@k	65	No	LLaMA: Open and Efficient Foundation Language Mo...	2023-02-27	Code
44	LLaMA 65B	65	No	LLaMA: Open and Efficient Foundation Language Mo...	2023-02-27	Code
45	Minerva 62B (maj5@100)	62	No	Solving Quantitative Reasoning Problems with Lan...	2022-06-29	Code
46	Minerva 62B (maj1@100)	62	Yes	Solving Quantitative Reasoning Problems with Lan...	2022-06-29	Code
47	Minerva 62B (8-shot)	62	Yes	Solving Quantitative Reasoning Problems with Lan...	2022-06-29	Code
48	PaLM 62B (8-shot)	62	Yes	Solving Quantitative Reasoning Problems with Lan...	2022-06-29	Code
49	OpenMath-CodeLlama-34B (w/ code, SC, k=50)	34	Yes	OpenMathInstruct-1: A 1.8 Million Math Instructi...	2024-02-15	Code
50	ToRA-Code-34B (SC, k=50)	34	Yes	ToRA: A Tool-Integrated Reasoning Agent for Math...	2023-09-29	Code
51	MathCoder-CL-34B	34	Yes	MathCoder: Seamless Code Integration in LLMs for...	2023-10-05	Code
52	ToRA-Code 34B	34	Yes	ToRA: A Tool-Integrated Reasoning Agent for Math...	2023-09-29	Code
53	OpenMath-CodeLlama-34B (w/ code)	34	Yes	OpenMathInstruct-1: A 1.8 Million Math Instructi...	2024-02-15	Code
54	MMOS-CODE-34B(0-shot)	34	Yes	An Empirical Study of Data Ability Boundary in L...	2024-02-23	Code
55	Llemma 34B	34	No	Llemma: An Open Language Model For Mathematics	2023-10-16	Code
56	LLaMA 33B-maj1@k	33	No	LLaMA: Open and Efficient Foundation Language Mo...	2023-02-27	Code
57	LLaMA 33B	33	No	LLaMA: Open and Efficient Foundation Language Mo...	2023-02-27	Code
58	UL2 20B (chain-of-thought)	20	No	UL2: Unifying Language Learning Paradigms	2022-05-10	Code
59	UL2 20B (0-shot)	20	No	UL2: Unifying Language Learning Paradigms	2022-05-10	Code
60	Llama SFT (Metamath ToRA Ensemble)	13	Yes	-	-	-
61	OpenMath-CodeLlama-13B (w/ code, SC, k=50)	13	Yes	OpenMathInstruct-1: A 1.8 Million Math Instructi...	2024-02-15	Code
62	OpenMath-CodeLlama-13B (w/ code)	13	Yes	OpenMathInstruct-1: A 1.8 Million Math Instructi...	2024-02-15	Code
63	ToRA-Code 13B	13	Yes	ToRA: A Tool-Integrated Reasoning Agent for Math...	2023-09-29	Code
64	MuggleMATH 13B	13	Yes	MuggleMath: Assessing the Impact of Query and Re...	2023-10-09	Code
65	KwaiYiiMath 13B	13	Yes	KwaiYiiMath: Technical Report	2023-10-11	-
66	MathCoder-L-13B	13	Yes	MathCoder: Seamless Code Integration in LLMs for...	2023-10-05	Code
67	MetaMath 13B	13	Yes	MetaMath: Bootstrap Your Own Mathematical Questi...	2023-09-21	Code
68	WizardMath-13B-V1.0	13	Yes	WizardMath: Empowering Mathematical Reasoning fo...	2023-08-18	Code
69	Orca 2 13B	13	No	Orca 2: Teaching Small Language Models How to Re...	2023-11-18	-
70	RFT 13B	13	Yes	Scaling Relationship on Learning Mathematical Re...	2023-08-03	Code
71	Llama-2 13B (on 100 first questions, 4-shot, auto-optimized prompting)	13	No	The Unreasonable Effectiveness of Eccentric Auto...	2024-02-09	-
72	Vicuna (SYRELM)	13	Yes	Frugal LMs Trained to Invoke Symbolic Solvers Ac...	2023-12-09	Code
73	LLaMA 13B-maj1@k	13	No	LLaMA: Open and Efficient Foundation Language Mo...	2023-02-27	Code
74	LLaMA 13B	13	No	LLaMA: Open and Efficient Foundation Language Mo...	2023-02-27	Code
75	GPT-J (CoRe)	12	No	Solving Math Word Problems via Cooperative Reaso...	2022-10-28	Code
76	DART-Math-Llama3-8B-Uniform (0-shot CoT, w/o code)	8	Yes	DART-Math: Difficulty-Aware Rejection Tuning for...	2024-06-18	Code
77	DART-Math-Llama3-8B-Prop2Diff (0-shot CoT, w/o code)	8	Yes	DART-Math: Difficulty-Aware Rejection Tuning for...	2024-06-18	Code
78	Minerva 8B (maj5@100)	8	No	Solving Quantitative Reasoning Problems with Lan...	2022-06-29	Code
79	Minerva 8B-maj1@k (8-shot)	8	Yes	Solving Quantitative Reasoning Problems with Lan...	2022-06-29	Code
80	Minerva 8B (8-shot)	8	Yes	Solving Quantitative Reasoning Problems with Lan...	2022-06-29	Code
81	PaLM 8B (8-shot)	8	Yes	Solving Quantitative Reasoning Problems with Lan...	2022-06-29	Code
82	SFT-Mistral-7B (Metamath, OVM, Smart Ensemble)	7	Yes	-	-	-
83	DAMOMath-7B(MetaMath, OVM, BS, Ensemble)	7	Yes	-	-	-
84	SFT-Mistral-7B (Metamath + ovm +ensemble)	7	Yes	-	-	-
85	DAMOMath-7B(MetaMath, OVM, Ensemble)	7	Yes	-	-	-
86	Shepherd+Mistral-7B (SFT on MetaMATH + PRM RL+ PRM rerank, k=256)	7	Yes	Math-Shepherd: Verify and Reinforce LLMs Step-by...	2023-12-14	Code
87	DeepSeekMATH-RL-7B	7	Yes	DeepSeekMath: Pushing the Limits of Mathematical...	2024-02-05	Code
88	DART-Math-DSMath-7B-Uniform (0-shot CoT, w/o code)	7	Yes	DART-Math: Difficulty-Aware Rejection Tuning for...	2024-06-18	Code
89	MMOS-DeepSeekMath-7B(0-shot,k=50)	7	Yes	An Empirical Study of Data Ability Boundary in L...	2024-02-23	Code
90	OpenMath-Mistral-7B (w/ code, SC, k=50)	7	Yes	OpenMathInstruct-1: A 1.8 Million Math Instructi...	2024-02-15	Code
91	Orca-Math 7B (fine-tuned)	7	Yes	Orca-Math: Unlocking the potential of SLMs in Gr...	2024-02-16	-
92	DART-Math-DSMath-7B-Prop2Diff (0-shot CoT, w/o code)	7	Yes	DART-Math: Difficulty-Aware Rejection Tuning for...	2024-06-18	Code
93	OpenMath-CodeLlama-7B (w/ code, SC, k=50)	7	Yes	OpenMathInstruct-1: A 1.8 Million Math Instructi...	2024-02-15	Code
94	OVM-Mistral-7B (verify100@1)	7	No	OVM, Outcome-supervised Value Models for Plannin...	2023-11-16	Code
95	Shepherd + Mistral-7B (SFT on MetaMATH + PRM RL)	7	Yes	Math-Shepherd: Verify and Reinforce LLMs Step-by...	2023-12-14	Code
96	WizardMath-7B-V1.1	7	Yes	WizardMath: Empowering Mathematical Reasoning fo...	2023-08-18	Code
97	OVM-Mistral-7B (verify20@1)	7	No	OVM, Outcome-supervised Value Models for Plannin...	2023-11-16	Code
98	DART-Math-Mistral-7B-Uniform (0-shot CoT, w/o code)	7	Yes	DART-Math: Difficulty-Aware Rejection Tuning for...	2024-06-18	Code
99	DART-Math-Mistral-7B-Prop2Diff (0-shot CoT, w/o code)	7	Yes	DART-Math: Difficulty-Aware Rejection Tuning for...	2024-06-18	Code
100	MMOS-DeepSeekMath-7B(0-shot)	7	Yes	An Empirical Study of Data Ability Boundary in L...	2024-02-23	Code
101	OpenMath-Mistral-7B (w/ code)	7	Yes	OpenMathInstruct-1: A 1.8 Million Math Instructi...	2024-02-15	Code
102	MetaMath-Mistral-7B	7	Yes	MetaMath: Bootstrap Your Own Mathematical Questi...	2023-09-21	Code
103	OpenChat-3.5 7B	7	No	OpenChat: Advancing Open-source Language Models ...	2023-09-20	Code
104	Arithmo2-Mistral-7B	7	No	-	-	-
105	OpenMath-CodeLlama-7B (w/ code)	7	Yes	OpenMathInstruct-1: A 1.8 Million Math Instructi...	2024-02-15	Code
106	Arithmo-Mistral-7B	7	No	-	-	-
107	MathCoder-CL-13B	7	Yes	MathCoder: Seamless Code Integration in LLMs for...	2023-10-05	Code
108	MMOS-CODE-7B(0-shot)	7	Yes	An Empirical Study of Data Ability Boundary in L...	2024-02-23	Code
109	OVM-Llama2-7B (verify100@1)	7	No	OVM, Outcome-supervised Value Models for Plannin...	2023-11-16	Code
110	ToRA-Code 7B	7	Yes	ToRA: A Tool-Integrated Reasoning Agent for Math...	2023-09-29	Code
111	MuggleMATH 7B	7	Yes	MuggleMath: Assessing the Impact of Query and Re...	2023-10-09	Code
112	MathCoder-CL-7B	7	Yes	MathCoder: Seamless Code Integration in LLMs for...	2023-10-05	Code
113	MetaMath 7B	7	Yes	MetaMath: Bootstrap Your Own Mathematical Questi...	2023-09-21	Code
114	MathCoder-L-7B	7	Yes	MathCoder: Seamless Code Integration in LLMs for...	2023-10-05	Code
115	WizardMath-7B-V1.0	7	Yes	WizardMath: Empowering Mathematical Reasoning fo...	2023-08-18	Code
116	Mistral 7B (maj@8)	7	No	Mistral 7B	2023-10-10	Code
117	RFT 7B	7	Yes	Scaling Relationship on Learning Mathematical Re...	2023-08-03	Code
118	Orca 2 7B	7	No	Orca 2: Teaching Small Language Models How to Re...	2023-11-18	-
119	Mistral 7B (on 100 first questions, 4-shot, auto-optimized prompting)	7	No	The Unreasonable Effectiveness of Eccentric Auto...	2024-02-09	-
120	Llemma 7B	7	No	Llemma: An Open Language Model For Mathematics	2023-10-16	Code
121	LLaMA 7B (maj1@k)	7	No	LLaMA: Open and Efficient Foundation Language Mo...	2023-02-27	Code
122	LLaMA 7B	7	No	LLaMA: Open and Efficient Foundation Language Mo...	2023-02-27	Code
123	Shivaay-4B (8-shot chain-of-thought)	4	No	-	-	-
124	Phi-GSM 2.7B (fine-tuned)	2.7	No	TinyGSM: achieving >80% on GSM8k with small lang...	2023-12-14	-
125	GPT-Neo-2.7B + Self-Sampling	2.7	No	Learning Math Reasoning from Self-Sampled Correc...	2022-05-28	Code
126	Phi-GSM+V 1.3B+1.3B (verify48@1)	2.6	No	TinyGSM: achieving >80% on GSM8k with small lang...	2023-12-14	-
127	CodeT5+	0.77	No	CodeT5+: Open Code Large Language Models for Cod...	2023-05-13	Code
128	GPT-2-Medium 355M + question-solution classifier (BS=5)	0.355	No	Composing Ensembles of Pre-trained Models via It...	2022-10-20	-
129	GPT-2-Medium 355M (fine-tuned, BS=5)	0.355	No	Composing Ensembles of Pre-trained Models via It...	2022-10-20	-
130	GPT-2-Medium 355M + question-solution classifier (BS=1)	0.355	No	Composing Ensembles of Pre-trained Models via It...	2022-10-20	-
131	GPT-2-Medium 355M (BS=5)	0.355	No	Composing Ensembles of Pre-trained Models via It...	2022-10-20	-
132	GPT-Neo 125M + Self-Sampling	0.125	No	Learning Math Reasoning from Self-Sampled Correc...	2022-05-28	Code

#1PaLM 540B (Self Improvement, Self Consistency)
540
Parameters (Billion)· 2022-10-20
Large Language Models Can Self-Improve
#2Minerva 540B (CoT)
540
Parameters (Billion)· 2022-06-29
Solving Quantitative Reasoning Problems with Language Models Code
#3PaLM 540B maj1@40 (8-shot)SOTA
540
Parameters (Billion)· Extra Data· 2022-03-21
Self-Consistency Improves Chain of Thought Reasoning in Language Models Code
#4PaLM 540B (Self Consistency)
540
Parameters (Billion)· 2022-10-20
Large Language Models Can Self-Improve
#5PaLM 540B (Self Improvement, CoT Prompting)
540
Parameters (Billion)· 2022-10-20
Large Language Models Can Self-Improve
#6U-PaLM
540
Parameters (Billion)· 2022-10-20
Transcending Scaling Laws with 0.1% Extra Compute
#7PaLM-540B (few-Shot-cot)
540
Parameters (Billion)· Extra Data· 2022-05-24
Large Language Models are Zero-Shot Reasoners Code
#8PaLM 540B (8-shot)
540
Parameters (Billion)· Extra Data· 2022-06-29
Solving Quantitative Reasoning Problems with Language Models Code
#9PaLM 540B (CoT Prompting)
540
Parameters (Billion)· 2022-10-20
Large Language Models Can Self-Improve
#10PaLM 540B (Self Improvement, Standard-Prompting)
540
Parameters (Billion)· 2022-10-20
Large Language Models Can Self-Improve
#11PaLM 540B (few-shot)
540
Parameters (Billion)· Extra Data· 2022-05-24
Large Language Models are Zero-Shot Reasoners Code
#12PaLM 540B (Standard-Prompting)
540
Parameters (Billion)· 2022-10-20
Large Language Models Can Self-Improve
#13code-davinci-002 175B (LEVER, 8-shot)
175
Parameters (Billion)· 2023-02-16
LEVER: Learning to Verify Language-to-Code Generation with Execution Code
#14DIVERSE 175B (8-shot)
175
Parameters (Billion)· 2022-06-06
Making Large Language Models Better Reasoners with Step-Aware Verifier
#15code-davinci-002 (Least-to-Most Prompting)
175
Parameters (Billion)· 2022-05-21
Least-to-Most Prompting Enables Complex Reasoning in Large Language Models Code
#16Finetuned GPT-3 175B + verifier
175
Parameters (Billion)· Extra Data· 2022-05-24
Large Language Models are Zero-Shot Reasoners Code
#17Text-davinci-002-175B (zero-plus-few-Shot-cot (8 samples))
175
Parameters (Billion)· Extra Data· 2022-05-24
Large Language Models are Zero-Shot Reasoners Code
#18text-davinci-002 175B (2-shot, CoT)
175
Parameters (Billion)· Extra Data· 2022-05-24
Large Language Models are Zero-Shot Reasoners Code
#19text-davinci-002 175B (0-shot, CoT)
175
Parameters (Billion)· Extra Data· 2022-05-24
Large Language Models are Zero-Shot Reasoners Code
#20Text-davinci-002-175B (0-shot)
175
Parameters (Billion)· Extra Data· 2022-05-24
Large Language Models are Zero-Shot Reasoners Code
#21RFT 70B
79
Parameters (Billion)· Extra Data· 2023-08-03
Scaling Relationship on Learning Mathematical Reasoning with Large Language Models Code
#22Jiutian-大模型
75
Parameters (Billion)
No paper
#23Qwen2-Math-72B-Instruct (greedy)
72
Parameters (Billion)· Extra Data· 2024-07-15
Qwen2 Technical Report Code
#24AlphaLLM (with MCTS)
70
Parameters (Billion)· 2024-04-18
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing Code
#25OpenMath-CodeLlama-70B (w/ code, SC, k=50)
70
Parameters (Billion)· Extra Data· 2024-02-15
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset Code
#26DART-Math-Llama3-70B-Uniform (0-shot CoT, w/o code)
70
Parameters (Billion)· Extra Data· 2024-06-18
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving Code
#27OpenMath-Llama2-70B (w/ code, SC, k=50)
70
Parameters (Billion)· Extra Data· 2024-02-15
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset Code
#28DART-Math-Llama3-70B-Prop2Diff (0-shot CoT, w/o code)
70
Parameters (Billion)· Extra Data· 2024-06-18
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving Code
#29ToRA-70B (SC, k=50)
70
Parameters (Billion)· Extra Data· 2023-09-29
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving Code
#30DeepMind 70B Model (SFT+ORM-RL, ORM reranking)
70
Parameters (Billion)· Extra Data· 2022-11-25
Solving math word problems with process- and outcome-based feedback
#31DeepMind 70B Model (SFT+PRM-RL, PRM reranking)
70
Parameters (Billion)· Extra Data· 2022-11-25
Solving math word problems with process- and outcome-based feedback
#32OpenMath-Llama2-70B (w/ code)
70
Parameters (Billion)· Extra Data· 2024-02-15
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset Code
#33OpenMath-CodeLlama-70B (w/ code)
70
Parameters (Billion)· Extra Data· 2024-02-15
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset Code
#34ToRA 70B
70
Parameters (Billion)· Extra Data· 2023-09-29
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving Code
#35MathCoder-L-70B
70
Parameters (Billion)· Extra Data· 2023-10-05
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning Code
#36MetaMath 70B
70
Parameters (Billion)· Extra Data· 2023-09-21
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models Code
#37MuggleMATH 70B
70
Parameters (Billion)· Extra Data· 2023-10-09
MuggleMath: Assessing the Impact of Query and Response Augmentation on Math Reasoning Code
#38WizardMath-70B-V1.0
70
Parameters (Billion)· Extra Data· 2023-08-18
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct Code
#39DeepMind 70B Model (STaR, maj1@96)
70
Parameters (Billion)· Extra Data· 2022-11-25
Solving math word problems with process- and outcome-based feedback
#40Llama-2 70B (on 100 first questions, 4-shot, auto-optimized prompting)
70
Parameters (Billion)· 2024-02-09
The Unreasonable Effectiveness of Eccentric Automatic Prompts
#41LLaMA 2 70B (CoT-Influx)
70
Parameters (Billion)· 2023-12-14
Fewer is More: Boosting LLM Reasoning with Reinforced Context Pruning
#42LLaMA 2 70B (on-shot)
70
Parameters (Billion)· 2023-07-18
Llama 2: Open Foundation and Fine-Tuned Chat Models Code
#43LLaMA 65B-maj1@k
65
Parameters (Billion)· 2023-02-27
LLaMA: Open and Efficient Foundation Language Models Code
#44LLaMA 65B
65
Parameters (Billion)· 2023-02-27
LLaMA: Open and Efficient Foundation Language Models Code
#45Minerva 62B (maj5@100)
62
Parameters (Billion)· 2022-06-29
Solving Quantitative Reasoning Problems with Language Models Code
#46Minerva 62B (maj1@100)
62
Parameters (Billion)· Extra Data· 2022-06-29
Solving Quantitative Reasoning Problems with Language Models Code
#47Minerva 62B (8-shot)
62
Parameters (Billion)· Extra Data· 2022-06-29
Solving Quantitative Reasoning Problems with Language Models Code
#48PaLM 62B (8-shot)
62
Parameters (Billion)· Extra Data· 2022-06-29
Solving Quantitative Reasoning Problems with Language Models Code
#49OpenMath-CodeLlama-34B (w/ code, SC, k=50)
34
Parameters (Billion)· Extra Data· 2024-02-15
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset Code
#50ToRA-Code-34B (SC, k=50)
34
Parameters (Billion)· Extra Data· 2023-09-29
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving Code
#51MathCoder-CL-34B
34
Parameters (Billion)· Extra Data· 2023-10-05
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning Code
#52ToRA-Code 34B
34
Parameters (Billion)· Extra Data· 2023-09-29
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving Code
#53OpenMath-CodeLlama-34B (w/ code)
34
Parameters (Billion)· Extra Data· 2024-02-15
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset Code
#54MMOS-CODE-34B(0-shot)
34
Parameters (Billion)· Extra Data· 2024-02-23
An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning Code
#55Llemma 34B
34
Parameters (Billion)· 2023-10-16
Llemma: An Open Language Model For Mathematics Code
#56LLaMA 33B-maj1@k
33
Parameters (Billion)· 2023-02-27
LLaMA: Open and Efficient Foundation Language Models Code
#57LLaMA 33B
33
Parameters (Billion)· 2023-02-27
LLaMA: Open and Efficient Foundation Language Models Code
#58UL2 20B (chain-of-thought)
20
Parameters (Billion)· 2022-05-10
UL2: Unifying Language Learning Paradigms Code
#59UL2 20B (0-shot)
20
Parameters (Billion)· 2022-05-10
UL2: Unifying Language Learning Paradigms Code
#60Llama SFT (Metamath ToRA Ensemble)
13
Parameters (Billion)· Extra Data
No paper
#61OpenMath-CodeLlama-13B (w/ code, SC, k=50)
13
Parameters (Billion)· Extra Data· 2024-02-15
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset Code
#62OpenMath-CodeLlama-13B (w/ code)
13
Parameters (Billion)· Extra Data· 2024-02-15
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset Code
#63ToRA-Code 13B
13
Parameters (Billion)· Extra Data· 2023-09-29
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving Code
#64MuggleMATH 13B
13
Parameters (Billion)· Extra Data· 2023-10-09
MuggleMath: Assessing the Impact of Query and Response Augmentation on Math Reasoning Code
#65KwaiYiiMath 13B
13
Parameters (Billion)· Extra Data· 2023-10-11
KwaiYiiMath: Technical Report
#66MathCoder-L-13B
13
Parameters (Billion)· Extra Data· 2023-10-05
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning Code
#67MetaMath 13B
13
Parameters (Billion)· Extra Data· 2023-09-21
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models Code
#68WizardMath-13B-V1.0
13
Parameters (Billion)· Extra Data· 2023-08-18
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct Code
#69Orca 2 13B
13
Parameters (Billion)· 2023-11-18
Orca 2: Teaching Small Language Models How to Reason
#70RFT 13B
13
Parameters (Billion)· Extra Data· 2023-08-03
Scaling Relationship on Learning Mathematical Reasoning with Large Language Models Code
#71Llama-2 13B (on 100 first questions, 4-shot, auto-optimized prompting)
13
Parameters (Billion)· 2024-02-09
The Unreasonable Effectiveness of Eccentric Automatic Prompts
#72Vicuna (SYRELM)
13
Parameters (Billion)· Extra Data· 2023-12-09
Frugal LMs Trained to Invoke Symbolic Solvers Achieve Parameter-Efficient Arithmetic Reasoning Code
#73LLaMA 13B-maj1@k
13
Parameters (Billion)· 2023-02-27
LLaMA: Open and Efficient Foundation Language Models Code
#74LLaMA 13B
13
Parameters (Billion)· 2023-02-27
LLaMA: Open and Efficient Foundation Language Models Code
#75GPT-J (CoRe)
12
Parameters (Billion)· 2022-10-28
Solving Math Word Problems via Cooperative Reasoning induced Language Models Code
#76DART-Math-Llama3-8B-Uniform (0-shot CoT, w/o code)
8
Parameters (Billion)· Extra Data· 2024-06-18
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving Code
#77DART-Math-Llama3-8B-Prop2Diff (0-shot CoT, w/o code)
8
Parameters (Billion)· Extra Data· 2024-06-18
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving Code
#78Minerva 8B (maj5@100)
8
Parameters (Billion)· 2022-06-29
Solving Quantitative Reasoning Problems with Language Models Code
#79Minerva 8B-maj1@k (8-shot)
8
Parameters (Billion)· Extra Data· 2022-06-29
Solving Quantitative Reasoning Problems with Language Models Code
#80Minerva 8B (8-shot)
8
Parameters (Billion)· Extra Data· 2022-06-29
Solving Quantitative Reasoning Problems with Language Models Code
#81PaLM 8B (8-shot)
8
Parameters (Billion)· Extra Data· 2022-06-29
Solving Quantitative Reasoning Problems with Language Models Code
#82SFT-Mistral-7B (Metamath, OVM, Smart Ensemble)
7
Parameters (Billion)· Extra Data
No paper
#83DAMOMath-7B(MetaMath, OVM, BS, Ensemble)
7
Parameters (Billion)· Extra Data
No paper
#84SFT-Mistral-7B (Metamath + ovm +ensemble)
7
Parameters (Billion)· Extra Data
No paper
#85DAMOMath-7B(MetaMath, OVM, Ensemble)
7
Parameters (Billion)· Extra Data
No paper
#86Shepherd+Mistral-7B (SFT on MetaMATH + PRM RL+ PRM rerank, k=256)
7
Parameters (Billion)· Extra Data· 2023-12-14
Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations Code
#87DeepSeekMATH-RL-7B
7
Parameters (Billion)· Extra Data· 2024-02-05
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models Code
#88DART-Math-DSMath-7B-Uniform (0-shot CoT, w/o code)
7
Parameters (Billion)· Extra Data· 2024-06-18
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving Code
#89MMOS-DeepSeekMath-7B(0-shot,k=50)
7
Parameters (Billion)· Extra Data· 2024-02-23
An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning Code
#90OpenMath-Mistral-7B (w/ code, SC, k=50)
7
Parameters (Billion)· Extra Data· 2024-02-15
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset Code
#91Orca-Math 7B (fine-tuned)
7
Parameters (Billion)· Extra Data· 2024-02-16
Orca-Math: Unlocking the potential of SLMs in Grade School Math
#92DART-Math-DSMath-7B-Prop2Diff (0-shot CoT, w/o code)
7
Parameters (Billion)· Extra Data· 2024-06-18
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving Code
#93OpenMath-CodeLlama-7B (w/ code, SC, k=50)
7
Parameters (Billion)· Extra Data· 2024-02-15
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset Code
#94OVM-Mistral-7B (verify100@1)
7
Parameters (Billion)· 2023-11-16
OVM, Outcome-supervised Value Models for Planning in Mathematical Reasoning Code
#95Shepherd + Mistral-7B (SFT on MetaMATH + PRM RL)
7
Parameters (Billion)· Extra Data· 2023-12-14
Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations Code
#96WizardMath-7B-V1.1
7
Parameters (Billion)· Extra Data· 2023-08-18
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct Code
#97OVM-Mistral-7B (verify20@1)
7
Parameters (Billion)· 2023-11-16
OVM, Outcome-supervised Value Models for Planning in Mathematical Reasoning Code
#98DART-Math-Mistral-7B-Uniform (0-shot CoT, w/o code)
7
Parameters (Billion)· Extra Data· 2024-06-18
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving Code
#99DART-Math-Mistral-7B-Prop2Diff (0-shot CoT, w/o code)
7
Parameters (Billion)· Extra Data· 2024-06-18
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving Code
#100MMOS-DeepSeekMath-7B(0-shot)
7
Parameters (Billion)· Extra Data· 2024-02-23
An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning Code
#101OpenMath-Mistral-7B (w/ code)
7
Parameters (Billion)· Extra Data· 2024-02-15
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset Code
#102MetaMath-Mistral-7B
7
Parameters (Billion)· Extra Data· 2023-09-21
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models Code
#103OpenChat-3.5 7B
7
Parameters (Billion)· 2023-09-20
OpenChat: Advancing Open-source Language Models with Mixed-Quality Data Code
#104Arithmo2-Mistral-7B
7
Parameters (Billion)
No paper
#105OpenMath-CodeLlama-7B (w/ code)
7
Parameters (Billion)· Extra Data· 2024-02-15
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset Code
#106Arithmo-Mistral-7B
7
Parameters (Billion)
No paper
#107MathCoder-CL-13B
7
Parameters (Billion)· Extra Data· 2023-10-05
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning Code
#108MMOS-CODE-7B(0-shot)
7
Parameters (Billion)· Extra Data· 2024-02-23
An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning Code
#109OVM-Llama2-7B (verify100@1)
7
Parameters (Billion)· 2023-11-16
OVM, Outcome-supervised Value Models for Planning in Mathematical Reasoning Code
#110ToRA-Code 7B
7
Parameters (Billion)· Extra Data· 2023-09-29
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving Code
#111MuggleMATH 7B
7
Parameters (Billion)· Extra Data· 2023-10-09
MuggleMath: Assessing the Impact of Query and Response Augmentation on Math Reasoning Code
#112MathCoder-CL-7B
7
Parameters (Billion)· Extra Data· 2023-10-05
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning Code
#113MetaMath 7B
7
Parameters (Billion)· Extra Data· 2023-09-21
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models Code
#114MathCoder-L-7B
7
Parameters (Billion)· Extra Data· 2023-10-05
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning Code
#115WizardMath-7B-V1.0
7
Parameters (Billion)· Extra Data· 2023-08-18
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct Code
#116Mistral 7B (maj@8)
7
Parameters (Billion)· 2023-10-10
Mistral 7B Code
#117RFT 7B
7
Parameters (Billion)· Extra Data· 2023-08-03
Scaling Relationship on Learning Mathematical Reasoning with Large Language Models Code
#118Orca 2 7B
7
Parameters (Billion)· 2023-11-18
Orca 2: Teaching Small Language Models How to Reason
#119Mistral 7B (on 100 first questions, 4-shot, auto-optimized prompting)
7
Parameters (Billion)· 2024-02-09
The Unreasonable Effectiveness of Eccentric Automatic Prompts
#120Llemma 7B
7
Parameters (Billion)· 2023-10-16
Llemma: An Open Language Model For Mathematics Code
#121LLaMA 7B (maj1@k)
7
Parameters (Billion)· 2023-02-27
LLaMA: Open and Efficient Foundation Language Models Code
#122LLaMA 7B
7
Parameters (Billion)· 2023-02-27
LLaMA: Open and Efficient Foundation Language Models Code
#123Shivaay-4B (8-shot chain-of-thought)
4
Parameters (Billion)
No paper
#124Phi-GSM 2.7B (fine-tuned)
2.7
Parameters (Billion)· 2023-12-14
TinyGSM: achieving >80% on GSM8k with small language models
#125GPT-Neo-2.7B + Self-Sampling
2.7
Parameters (Billion)· 2022-05-28
Learning Math Reasoning from Self-Sampled Correct and Partially-Correct Solutions Code
#126Phi-GSM+V 1.3B+1.3B (verify48@1)
2.6
Parameters (Billion)· 2023-12-14
TinyGSM: achieving >80% on GSM8k with small language models
#127CodeT5+
0.77
Parameters (Billion)· 2023-05-13
CodeT5+: Open Code Large Language Models for Code Understanding and Generation Code
#128GPT-2-Medium 355M + question-solution classifier (BS=5)
0.355
Parameters (Billion)· 2022-10-20
Composing Ensembles of Pre-trained Models via Iterative Consensus
#129GPT-2-Medium 355M (fine-tuned, BS=5)
0.355
Parameters (Billion)· 2022-10-20
Composing Ensembles of Pre-trained Models via Iterative Consensus
#130GPT-2-Medium 355M + question-solution classifier (BS=1)
0.355
Parameters (Billion)· 2022-10-20
Composing Ensembles of Pre-trained Models via Iterative Consensus
#131GPT-2-Medium 355M (BS=5)
0.355
Parameters (Billion)· 2022-10-20
Composing Ensembles of Pre-trained Models via Iterative Consensus
#132GPT-Neo 125M + Self-Sampling
0.125
Parameters (Billion)· 2022-05-28
Learning Math Reasoning from Self-Sampled Correct and Partially-Correct Solutions Code