TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Knowledge Base/Mathematical Question Answering/MATH

Mathematical Question Answering on MATH

Metric: Parameters (Billions) (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Parameters (Billions)▼Extra DataPaperDate↕Code
1Minerva 540B540NoSolving Quantitative Reasoning Problems with Lan...2022-06-29Code
2Minerva 540B (5-shot) mCoT540NoGalactica: A Large Language Model for Science2022-11-16Code
3PaLM 540B540NoSolving Quantitative Reasoning Problems with Lan...2022-06-29Code
4PaLM 540B (5-shot) mCoT540NoGalactica: A Large Language Model for Science2022-11-16Code
5davinci-002 175B175NoSolving Quantitative Reasoning Problems with Lan...2022-06-29Code
6GPT-3-175B (few-shot)175NoMeasuring Mathematical Problem Solving With the ...2021-03-05Code
7GPT-3 175B (8-shot)175NoGalactica: A Large Language Model for Science2022-11-16Code
8GAL 120B (5-shot) mCoT120NoGalactica: A Large Language Model for Science2022-11-16Code
9GAL 120B <work>120NoGalactica: A Large Language Model for Science2022-11-16Code
10Qwen2.5-Math-72B-Instruct(TIR,Greedy)72YesQwen2.5-Math Technical Report: Toward Mathematic...2024-09-18-
11Qwen2.5-Math-72B-Instruct(COT,Greedy)72YesQwen2.5-Math Technical Report: Toward Mathematic...2024-09-18-
12Qwen2-Math-72B-Instruct(greedy)72YesQwen2 Technical Report2024-07-15Code
13MMIQC-72B72YesAugmenting Math Word Problems via Iterative Ques...2024-01-17Code
14OpenMath-CodeLlama-70B (w/ code, SC, k=50)70YesOpenMathInstruct-1: A 1.8 Million Math Instructi...2024-02-15Code
15OpenMath-Llama2-70B (w/ code, SC, k=50)70YesOpenMathInstruct-1: A 1.8 Million Math Instructi...2024-02-15Code
16ToRA 70B (w/ code, SC, k=50)70YesToRA: A Tool-Integrated Reasoning Agent for Math...2023-09-29Code
17DART-Math-Llama3-70B-Prop2Diff (0-shot CoT, w/o code)70YesDART-Math: Difficulty-Aware Rejection Tuning for...2024-06-18Code
18DART-Math-Llama3-70B-Uniform (0-shot CoT, w/o code)70YesDART-Math: Difficulty-Aware Rejection Tuning for...2024-06-18Code
19OpenMath-CodeLlama-70B (w/ code)70YesOpenMathInstruct-1: A 1.8 Million Math Instructi...2024-02-15Code
20ToRA 70B (w/ code)70YesToRA: A Tool-Integrated Reasoning Agent for Math...2023-09-29Code
21OpenMath-Llama2-70B (w/ code)70YesOpenMathInstruct-1: A 1.8 Million Math Instructi...2024-02-15Code
22MuggleMATH-70B70YesMuggleMath: Assessing the Impact of Query and Re...2023-10-09Code
23MetaMath 70B70YesMetaMath: Bootstrap Your Own Mathematical Questi...2023-09-21Code
24WizardMath-70B-V1.070YesWizardMath: Empowering Mathematical Reasoning fo...2023-08-18Code
25Shepherd + DeepSeek-67B (SFT on MetaMATH + PRM rerank, k=256)67YesMath-Shepherd: Verify and Reinforce LLMs Step-by...2023-12-14Code
26LLaMA 65B (maj1@k)65NoLLaMA: Open and Efficient Foundation Language Mo...2023-02-27Code
27LLaMA 65B65NoLLaMA: Open and Efficient Foundation Language Mo...2023-02-27Code
28Minerva 62B (maj5@256)62NoSolving Quantitative Reasoning Problems with Lan...2022-06-29Code
29Minerva 62B (maj1@k, k=64)62NoSolving Quantitative Reasoning Problems with Lan...2022-06-29Code
30Minerva 62B (4-shot)62NoSolving Quantitative Reasoning Problems with Lan...2022-06-29Code
31PaLM 62B62NoSolving Quantitative Reasoning Problems with Lan...2022-06-29Code
32OpenMath-CodeLlama-34B (w/ code, SC, k=50)34YesOpenMathInstruct-1: A 1.8 Million Math Instructi...2024-02-15Code
33ToRA-Code 34B model (w/ code, SC, k=50)34YesToRA: A Tool-Integrated Reasoning Agent for Math...2023-09-29Code
34ToRA-Code 34B (w/ code)34YesToRA: A Tool-Integrated Reasoning Agent for Math...2023-09-29Code
35MMOS-CODE-34B(0-shot)34YesAn Empirical Study of Data Ability Boundary in L...2024-02-23Code
36Llemma-34B-KPMath-Plus34NoKey-Point-Driven Data Synthesis with its Enhance...2024-03-04-
37OpenMath-CodeLlama-34B (w/ code)34YesOpenMathInstruct-1: A 1.8 Million Math Instructi...2024-02-15Code
38MathCoder-CL-34B34YesMathCoder: Seamless Code Integration in LLMs for...2023-10-05Code
39MathCoder-L-34B34YesMathCoder: Seamless Code Integration in LLMs for...2023-10-05Code
40LLaMA 33B-maj1@k33NoLLaMA: Open and Efficient Foundation Language Mo...2023-02-27Code
41LLaMA 33B33NoLLaMA: Open and Efficient Foundation Language Mo...2023-02-27Code
42GAL 30B (5-shot) mCoT30NoGalactica: A Large Language Model for Science2022-11-16Code
43GAL 30B <work>30NoGalactica: A Large Language Model for Science2022-11-16Code
44OpenMath-CodeLlama-13B (w/ code, SC, k=50)13YesOpenMathInstruct-1: A 1.8 Million Math Instructi...2024-02-15Code
45ToRA-Code 13B (w/ code)13YesToRA: A Tool-Integrated Reasoning Agent for Math...2023-09-29Code
46OpenMath-CodeLlama-13B (w/ code)13YesOpenMathInstruct-1: A 1.8 Million Math Instructi...2024-02-15Code
47ToRA 13B (w/ code)13YesToRA: A Tool-Integrated Reasoning Agent for Math...2023-09-29Code
48Llama2-13B-KPMath-Plus13NoKey-Point-Driven Data Synthesis with its Enhance...2024-03-04-
49MathCoder-CL-13B13YesMathCoder: Seamless Code Integration in LLMs for...2023-10-05Code
50MuggleMATH-13B13YesMuggleMath: Assessing the Impact of Query and Re...2023-10-09Code
51MathCoder-L-13B13YesMathCoder: Seamless Code Integration in LLMs for...2023-10-05Code
52MetaMath 13B13YesMetaMath: Bootstrap Your Own Mathematical Questi...2023-09-21Code
53WizardMath-13B-V1.013YesWizardMath: Empowering Mathematical Reasoning fo...2023-08-18Code
54LLaMA 13B-maj1@k13NoLLaMA: Open and Efficient Foundation Language Mo...2023-02-27Code
55GPT-3 13B13NoMeasuring Mathematical Problem Solving With the ...2021-03-05Code
56LLaMA 13B13NoLLaMA: Open and Efficient Foundation Language Mo...2023-02-27Code
57GPT-3-13B (few-shot)13NoMeasuring Mathematical Problem Solving With the ...2021-03-05Code
58Minerva 8B (maj5@256)8NoSolving Quantitative Reasoning Problems with Lan...2022-06-29Code
59DART-Math-Llama3-8B-Prop2Diff (0-shot CoT, w/o code)8YesDART-Math: Difficulty-Aware Rejection Tuning for...2024-06-18Code
60DART-Math-Llama3-8B-Uniform (0-shot CoT, w/o code)8YesDART-Math: Difficulty-Aware Rejection Tuning for...2024-06-18Code
61Minerva 8B (maj1@k, k=64)8NoSolving Quantitative Reasoning Problems with Lan...2022-06-29Code
62Minerva 8B8NoSolving Quantitative Reasoning Problems with Lan...2022-06-29Code
63PaLM 8B (fine-tuned)8NoSolving Quantitative Reasoning Problems with Lan...2022-06-29Code
64PaLM 8B8NoSolving Quantitative Reasoning Problems with Lan...2022-06-29Code
65Qwen2.5-Math-7B-Instruct(TIR,Greedy)7YesQwen2.5-Math Technical Report: Toward Mathematic...2024-09-18-
66Qwen2.5-Math-7B-Instruct(COT,Greedy)7YesQwen2.5-Math Technical Report: Toward Mathematic...2024-09-18-
67DAMOMath-7B7Yes---
68MMOS-DeepSeekMath-7B(0-shot,k=50)7YesAn Empirical Study of Data Ability Boundary in L...2024-02-23Code
69DeepSeekMATH-RL-7B (w/ code, greedy decoding)7YesDeepSeekMath: Pushing the Limits of Mathematical...2024-02-05Code
70OpenMath-Mistral-7B (w/ code, SC, k=50)7YesOpenMathInstruct-1: A 1.8 Million Math Instructi...2024-02-15Code
71OpenMath-CodeLlama-7B (w/ code, SC, k=50)7YesOpenMathInstruct-1: A 1.8 Million Math Instructi...2024-02-15Code
72MMOS-DeepSeekMath-7B(0-shot)7YesAn Empirical Study of Data Ability Boundary in L...2024-02-23Code
73DART-Math-DSMath-7B-Prop2Diff (0-shot CoT, w/o code)7YesDART-Math: Difficulty-Aware Rejection Tuning for...2024-06-18Code
74DART-Math-DSMath-7B-Uniform (0-shot CoT, w/o code)7YesDART-Math: Difficulty-Aware Rejection Tuning for...2024-06-18Code
75DeepSeekMATH-RL-7B (greedy decoding)7YesDeepSeekMath: Pushing the Limits of Mathematical...2024-02-05Code
76DeepSeekMath-7B-KPMath-Plus7NoKey-Point-Driven Data Synthesis with its Enhance...2024-03-04-
77Mistral-7B-KPMath-Plus7YesKey-Point-Driven Data Synthesis with its Enhance...2024-03-04-
78DART-Math-Mistral-7B-Prop2Diff (0-shot CoT, w/o code)7YesDART-Math: Difficulty-Aware Rejection Tuning for...2024-06-18Code
79ToRA-Code 7B (w/ code)7YesToRA: A Tool-Integrated Reasoning Agent for Math...2023-09-29Code
80OpenMath-Mistral-7B (w/ code)7YesOpenMathInstruct-1: A 1.8 Million Math Instructi...2024-02-15Code
81MMOS-CODE-7B(0-shot)7YesAn Empirical Study of Data Ability Boundary in L...2024-02-23Code
82OpenMath-CodeLlama-7B (w/ code)7YesOpenMathInstruct-1: A 1.8 Million Math Instructi...2024-02-15Code
83Shepherd+Mistral-7B (SFT on MetaMATH + PRM RL+ PRM rerank, k=256)7YesMath-Shepherd: Verify and Reinforce LLMs Step-by...2023-12-14Code
84DART-Math-Mistral-7B-Uniform (0-shot CoT, w/o code)7YesDART-Math: Difficulty-Aware Rejection Tuning for...2024-06-18Code
85SFT-Mistral-7B7Yes---
86ToRA 7B (w/ code)7YesToRA: A Tool-Integrated Reasoning Agent for Math...2023-09-29Code
87Shepherd + Mistral-7B (SFT on MetaMATH + PRM RL)7YesMath-Shepherd: Verify and Reinforce LLMs Step-by...2023-12-14Code
88WizardMath-7B-V1.17YesWizardMath: Empowering Mathematical Reasoning fo...2023-08-18Code
89MathCoder-CL-7B7YesMathCoder: Seamless Code Integration in LLMs for...2023-10-05Code
90OpenChat-3.5-1210 7B7NoOpenChat: Advancing Open-source Language Models ...2023-09-20Code
91OpenChat-3.5 7B7NoOpenChat: Advancing Open-source Language Models ...2023-09-20Code
92MuggleMATH 7B7YesMuggleMath: Assessing the Impact of Query and Re...2023-10-09Code
93MathCoder-L-7B7YesMathCoder: Seamless Code Integration in LLMs for...2023-10-05Code
94MetaMath 7B7YesMetaMath: Bootstrap Your Own Mathematical Questi...2023-09-21Code
95Mistral 7B (maj@4)7NoMixtral of Experts2024-01-08Code
96Mistral 7B (maj@4)7NoMixtral of Experts2024-01-08Code
97WizardMath-7B-V1.07YesWizardMath: Empowering Mathematical Reasoning fo...2023-08-18Code
98LLaMA 7B-maj1@k7NoLLaMA: Open and Efficient Foundation Language Mo...2023-02-27Code
99LLaMA 7B7NoLLaMA: Open and Efficient Foundation Language Mo...2023-02-27Code
100GPT-3 2.7B2.7NoMeasuring Mathematical Problem Solving With the ...2021-03-05Code
101Qwen2.5-Math-1.5B-Instruct(TIR,Greedy)1.5YesQwen2.5-Math Technical Report: Toward Mathematic...2024-09-18-
102Qwen2.5-Math-1.5B-Instruct(COT,Greedy)1.5YesQwen2.5-Math Technical Report: Toward Mathematic...2024-09-18-
103GPT-2 (1.5B)1.5NoMeasuring Mathematical Problem Solving With the ...2021-03-05Code
104GPT-2 (0.7B)0.7NoMeasuring Mathematical Problem Solving With the ...2021-03-05Code
105GPT-2 (0.3B)0.3NoMeasuring Mathematical Problem Solving With the ...2021-03-05Code
106GPT-2 (0.1B)0.1NoMeasuring Mathematical Problem Solving With the ...2021-03-05Code