| 1 | Gemini 2.0 Flash Experimental | 89.7 | No | - | - | - |
| 2 | Qwen2.5-Math-72B-Instruct(TIR,Greedy) | 88.1 | Yes | Qwen2.5-Math Technical Report: Toward Mathematic... | 2024-09-18 | - |
| 3 | GPT-4 Turbo (MACM, w/code, voting) | 87.92 | No | MACM: Utilizing a Multi-Agent System for Conditi... | 2024-04-06 | Code |
| 4 | Qwen2.5-Math-72B-Instruct(COT,Greedy) | 85.9 | Yes | Qwen2.5-Math Technical Report: Toward Mathematic... | 2024-09-18 | - |
| 5 | Qwen2.5-Math-7B-Instruct(TIR,Greedy) | 85.2 | Yes | Qwen2.5-Math Technical Report: Toward Mathematic... | 2024-09-18 | - |
| 6 | GPT-4-code model (CSV, w/ code, SC, k=16) | 84.3 | No | Solving Challenging Math Word Problems Using GPT... | 2023-08-15 | Code |
| 7 | Qwen2-Math-72B-Instruct(greedy) | 84 | Yes | Qwen2 Technical Report | 2024-07-15 | Code |
| 8 | Qwen2.5-Math-7B-Instruct(COT,Greedy) | 83.6 | Yes | Qwen2.5-Math Technical Report: Toward Mathematic... | 2024-09-18 | - |
| 9 | Qwen2.5-Math-1.5B-Instruct(TIR,Greedy) | 79.9 | Yes | Qwen2.5-Math Technical Report: Toward Mathematic... | 2024-09-18 | - |
| 10 | OpenMath2-Llama3.1-70B (majority@256) | 79.6 | Yes | OpenMathInstruct-2: Accelerating AI for Math wit... | 2024-10-02 | Code |
| 11 | OpenMath2-Llama3.1-8B (majority@256) | 76.1 | Yes | OpenMathInstruct-2: Accelerating AI for Math wit... | 2024-10-02 | Code |
| 12 | Qwen2.5-Math-1.5B-Instruct(COT,Greedy) | 75.8 | Yes | Qwen2.5-Math Technical Report: Toward Mathematic... | 2024-09-18 | - |
| 13 | GPT-4-code model (CSV, w/ code) | 73.5 | No | Solving Challenging Math Word Problems Using GPT... | 2023-08-15 | Code |
| 14 | CR (GPT-4-turbo model, w/ code) | 72.2 | No | Cumulative Reasoning with Large Language Models | 2023-08-08 | Code |
| 15 | OpenMath2-Llama3.1-70B | 71.9 | Yes | OpenMathInstruct-2: Accelerating AI for Math wit... | 2024-10-02 | Code |
| 16 | LogicNet (with code interpreter) | 71.2 | Yes | Solving Challenging Math Word Problems Using GPT... | 2023-08-15 | Code |
| 17 | Qwen2-72B-Instruct-Step-DPO (0-shot CoT, w/o code) | 70.8 | Yes | Step-DPO: Step-wise Preference Optimization for ... | 2024-06-26 | Code |
| 18 | GPT-4-code model (w/ code) | 69.7 | No | Solving Challenging Math Word Problems Using GPT... | 2023-08-15 | Code |
| 19 | OpenMath2-Llama3.1-8B | 67.8 | Yes | OpenMathInstruct-2: Accelerating AI for Math wit... | 2024-10-02 | Code |
| 20 | AlphaMath-7B-SBS@3 | 66.3 | No | AlphaMath Almost Zero: Process Supervision witho... | 2024-05-06 | Code |
| 21 | Minerva 62B (maj5@256) | 64.9 | No | Solving Quantitative Reasoning Problems with Lan... | 2022-06-29 | Code |
| 22 | DAMOMath-7B | 64.5 | Yes | - | - | - |
| 23 | MMOS-DeepSeekMath-7B(0-shot,k=50) | 63.7 | Yes | An Empirical Study of Data Ability Boundary in L... | 2024-02-23 | Code |
| 24 | GPT-4-code model (w/o code) | 60.8 | No | Solving Challenging Math Word Problems Using GPT... | 2023-08-15 | Code |
| 25 | OpenMath-CodeLlama-70B (w/ code, SC, k=50) | 60.4 | Yes | OpenMathInstruct-1: A 1.8 Million Math Instructi... | 2024-02-15 | Code |
| 26 | OpenMath-CodeLlama-34B (w/ code, SC, k=50) | 60.2 | Yes | OpenMathInstruct-1: A 1.8 Million Math Instructi... | 2024-02-15 | Code |
| 27 | ToRA-Code 34B model (w/ code, SC, k=50) | 60 | Yes | ToRA: A Tool-Integrated Reasoning Agent for Math... | 2023-09-29 | Code |
| 28 | DeepSeekMATH-RL-7B (w/ code, greedy decoding) | 58.8 | Yes | DeepSeekMath: Pushing the Limits of Mathematical... | 2024-02-05 | Code |
| 29 | OpenMath-Llama2-70B (w/ code, SC, k=50) | 58.3 | Yes | OpenMathInstruct-1: A 1.8 Million Math Instructi... | 2024-02-15 | Code |
| 30 | CR (GPT-4 model, w/o code) | 58 | No | Cumulative Reasoning with Large Language Models | 2023-08-08 | Code |
| 31 | OpenMath-CodeLlama-13B (w/ code, SC, k=50) | 57.6 | Yes | OpenMathInstruct-1: A 1.8 Million Math Instructi... | 2024-02-15 | Code |
| 32 | OpenMath-Mistral-7B (w/ code, SC, k=50) | 57.2 | Yes | OpenMathInstruct-1: A 1.8 Million Math Instructi... | 2024-02-15 | Code |
| 33 | ToRA 70B (w/ code, SC, k=50) | 56.9 | Yes | ToRA: A Tool-Integrated Reasoning Agent for Math... | 2023-09-29 | Code |
| 34 | SKiC (GPT-4 model) | 56.4 | No | Skills-in-Context Prompting: Unlocking Compositi... | 2023-08-01 | - |
| 35 | DART-Math-Llama3-70B-Prop2Diff (0-shot CoT, w/o code) | 56.1 | Yes | DART-Math: Difficulty-Aware Rejection Tuning for... | 2024-06-18 | Code |
| 36 | OpenMath-CodeLlama-7B (w/ code, SC, k=50) | 55.6 | Yes | OpenMathInstruct-1: A 1.8 Million Math Instructi... | 2024-02-15 | Code |
| 37 | MMOS-DeepSeekMath-7B(0-shot) | 55 | Yes | An Empirical Study of Data Ability Boundary in L... | 2024-02-23 | Code |
| 38 | DART-Math-Llama3-70B-Uniform (0-shot CoT, w/o code) | 54.9 | Yes | DART-Math: Difficulty-Aware Rejection Tuning for... | 2024-06-18 | Code |
| 39 | PHP (GPT-4 model) | 53.9 | No | Progressive-Hint Prompting Improves Reasoning in... | 2023-04-19 | Code |
| 40 | DART-Math-DSMath-7B-Prop2Diff (0-shot CoT, w/o code) | 53.6 | Yes | DART-Math: Difficulty-Aware Rejection Tuning for... | 2024-06-18 | Code |
| 41 | Gemini Ultra (4-shot) | 53.2 | No | Gemini: A Family of Highly Capable Multimodal Mo... | 2023-12-19 | Code |
| 42 | DART-Math-DSMath-7B-Uniform (0-shot CoT, w/o code) | 52.9 | Yes | DART-Math: Difficulty-Aware Rejection Tuning for... | 2024-06-18 | Code |
| 43 | GPT-4 model (w/ code, PAL) | 51.8 | No | PAL: Program-aided Language Models | 2022-11-18 | Code |
| 44 | DeepSeekMATH-RL-7B (greedy decoding) | 51.7 | Yes | DeepSeekMath: Pushing the Limits of Mathematical... | 2024-02-05 | Code |
| 45 | AlphaLLM (with MCTS) | 51 | No | Toward Self-Improvement of LLMs via Imagination,... | 2024-04-18 | Code |
| 46 | ToRA-Code 34B (w/ code) | 50.8 | Yes | ToRA: A Tool-Integrated Reasoning Agent for Math... | 2023-09-29 | Code |
| 47 | OpenMath-CodeLlama-70B (w/ code) | 50.7 | Yes | OpenMathInstruct-1: A 1.8 Million Math Instructi... | 2024-02-15 | Code |
| 48 | Minerva 540B (maj1@k, k=64) | 50.3 | No | Solving Quantitative Reasoning Problems with Lan... | 2022-06-29 | Code |
| 49 | ToRA 70B (w/ code) | 49.7 | Yes | ToRA: A Tool-Integrated Reasoning Agent for Math... | 2023-09-29 | Code |
| 50 | MMOS-CODE-34B(0-shot) | 49.5 | Yes | An Empirical Study of Data Ability Boundary in L... | 2024-02-23 | Code |
| 51 | DeepSeekMath-7B-KPMath-Plus | 48.8 | No | Key-Point-Driven Data Synthesis with its Enhance... | 2024-03-04 | - |
| 52 | PaLM 2 (few-shot, k=4, SC) | 48.8 | No | PaLM 2 Technical Report | 2023-05-17 | Code |
| 53 | Llemma-34B-KPMath-Plus | 48.6 | No | Key-Point-Driven Data Synthesis with its Enhance... | 2024-03-04 | - |
| 54 | OpenMath-CodeLlama-34B (w/ code) | 48.3 | Yes | OpenMathInstruct-1: A 1.8 Million Math Instructi... | 2024-02-15 | Code |
| 55 | Shepherd + DeepSeek-67B (SFT on MetaMATH + PRM rerank, k=256) | 48.1 | Yes | Math-Shepherd: Verify and Reinforce LLMs Step-by... | 2023-12-14 | Code |
| 56 | ToRA-Code 13B (w/ code) | 48.1 | Yes | ToRA: A Tool-Integrated Reasoning Agent for Math... | 2023-09-29 | Code |
| 57 | Minerva 8B (maj5@256) | 47.6 | No | Solving Quantitative Reasoning Problems with Lan... | 2022-06-29 | Code |
| 58 | Mistral-7B-KPMath-Plus | 46.8 | Yes | Key-Point-Driven Data Synthesis with its Enhance... | 2024-03-04 | - |
| 59 | DART-Math-Llama3-8B-Prop2Diff (0-shot CoT, w/o code) | 46.6 | Yes | DART-Math: Difficulty-Aware Rejection Tuning for... | 2024-06-18 | Code |
| 60 | OpenMath-Llama2-70B (w/ code) | 46.3 | Yes | OpenMathInstruct-1: A 1.8 Million Math Instructi... | 2024-02-15 | Code |
| 61 | OpenMath-CodeLlama-13B (w/ code) | 45.5 | Yes | OpenMathInstruct-1: A 1.8 Million Math Instructi... | 2024-02-15 | Code |
| 62 | DART-Math-Mistral-7B-Prop2Diff (0-shot CoT, w/o code) | 45.5 | Yes | DART-Math: Difficulty-Aware Rejection Tuning for... | 2024-06-18 | Code |
| 63 | DART-Math-Llama3-8B-Uniform (0-shot CoT, w/o code) | 45.3 | Yes | DART-Math: Difficulty-Aware Rejection Tuning for... | 2024-06-18 | Code |
| 64 | MathCoder-CL-34B | 45.2 | Yes | MathCoder: Seamless Code Integration in LLMs for... | 2023-10-05 | Code |
| 65 | MathCoder-L-34B | 45.1 | Yes | MathCoder: Seamless Code Integration in LLMs for... | 2023-10-05 | Code |
| 66 | MMIQC-72B | 45 | Yes | Augmenting Math Word Problems via Iterative Ques... | 2024-01-17 | Code |
| 67 | ToRA-Code 7B (w/ code) | 44.6 | Yes | ToRA: A Tool-Integrated Reasoning Agent for Math... | 2023-09-29 | Code |
| 68 | OpenMath-Mistral-7B (w/ code) | 44.5 | Yes | OpenMathInstruct-1: A 1.8 Million Math Instructi... | 2024-02-15 | Code |
| 69 | MMOS-CODE-7B(0-shot) | 44.3 | Yes | An Empirical Study of Data Ability Boundary in L... | 2024-02-23 | Code |
| 70 | OpenMath-CodeLlama-7B (w/ code) | 43.6 | Yes | OpenMathInstruct-1: A 1.8 Million Math Instructi... | 2024-02-15 | Code |
| 71 | Shepherd+Mistral-7B (SFT on MetaMATH + PRM RL+ PRM rerank, k=256) | 43.5 | Yes | Math-Shepherd: Verify and Reinforce LLMs Step-by... | 2023-12-14 | Code |
| 72 | DART-Math-Mistral-7B-Uniform (0-shot CoT, w/o code) | 43.5 | Yes | DART-Math: Difficulty-Aware Rejection Tuning for... | 2024-06-18 | Code |
| 73 | Minerva 62B (maj1@k, k=64) | 43.4 | No | Solving Quantitative Reasoning Problems with Lan... | 2022-06-29 | Code |
| 74 | ToRA 13B (w/ code) | 43 | Yes | ToRA: A Tool-Integrated Reasoning Agent for Math... | 2023-09-29 | Code |
| 75 | GPT-4 | 42.5 | No | Sparks of Artificial General Intelligence: Early... | 2023-03-22 | Code |
| 76 | SFT-Mistral-7B | 41.8 | Yes | - | - | - |
| 77 | Llama2-13B-KPMath-Plus | 41 | No | Key-Point-Driven Data Synthesis with its Enhance... | 2024-03-04 | - |
| 78 | ToRA 7B (w/ code) | 40.1 | Yes | ToRA: A Tool-Integrated Reasoning Agent for Math... | 2023-09-29 | Code |
| 79 | MathCoder-CL-13B | 35.9 | Yes | MathCoder: Seamless Code Integration in LLMs for... | 2023-10-05 | Code |
| 80 | MuggleMATH-70B | 35.6 | Yes | MuggleMath: Assessing the Impact of Query and Re... | 2023-10-09 | Code |
| 81 | PaLM 2 (few-shot, k=4, CoT) | 34.3 | No | PaLM 2 Technical Report | 2023-05-17 | Code |
| 82 | Minerva 540B | 33.6 | No | Solving Quantitative Reasoning Problems with Lan... | 2022-06-29 | Code |
| 83 | Minerva 540B (5-shot) mCoT | 33.6 | No | Galactica: A Large Language Model for Science | 2022-11-16 | Code |
| 84 | Shepherd + Mistral-7B (SFT on MetaMATH + PRM RL) | 33 | Yes | Math-Shepherd: Verify and Reinforce LLMs Step-by... | 2023-12-14 | Code |
| 85 | WizardMath-7B-V1.1 | 33 | Yes | WizardMath: Empowering Mathematical Reasoning fo... | 2023-08-18 | Code |
| 86 | Gemini Pro (4-shot) | 32.6 | No | Gemini: A Family of Highly Capable Multimodal Mo... | 2023-12-19 | Code |
| 87 | MuggleMATH-13B | 30.7 | Yes | MuggleMath: Assessing the Impact of Query and Re... | 2023-10-09 | Code |
| 88 | MathCoder-CL-7B | 30.2 | Yes | MathCoder: Seamless Code Integration in LLMs for... | 2023-10-05 | Code |
| 89 | MathCoder-L-13B | 29.9 | Yes | MathCoder: Seamless Code Integration in LLMs for... | 2023-10-05 | Code |
| 90 | Qwen2idae-16x14B (4-shot) | 29.9 | No | Parameter-Efficient Sparsity Crafting from Dense... | 2024-01-05 | Code |
| 91 | OpenChat-3.5-1210 7B | 28.9 | No | OpenChat: Advancing Open-source Language Models ... | 2023-09-20 | Code |
| 92 | OpenChat-3.5 7B | 28.6 | No | OpenChat: Advancing Open-source Language Models ... | 2023-09-20 | Code |
| 93 | Mixtral 8x7B (maj@4) | 28.4 | No | Mixtral of Experts | 2024-01-08 | Code |
| 94 | Minerva 62B (4-shot) | 27.6 | No | Solving Quantitative Reasoning Problems with Lan... | 2022-06-29 | Code |
| 95 | MetaMath 70B | 26 | Yes | MetaMath: Bootstrap Your Own Mathematical Questi... | 2023-09-21 | Code |
| 96 | MuggleMATH 7B | 25.8 | Yes | MuggleMath: Assessing the Impact of Query and Re... | 2023-10-09 | Code |
| 97 | Minerva 8B (maj1@k, k=64) | 25.4 | No | Solving Quantitative Reasoning Problems with Lan... | 2022-06-29 | Code |
| 98 | MathCoder-L-7B | 23.3 | Yes | MathCoder: Seamless Code Integration in LLMs for... | 2023-10-05 | Code |
| 99 | WizardMath-70B-V1.0 | 22.7 | Yes | WizardMath: Empowering Mathematical Reasoning fo... | 2023-08-18 | Code |
| 100 | Camelidae-8×34B (4-shot) | 22.6 | No | Parameter-Efficient Sparsity Crafting from Dense... | 2024-01-05 | Code |
| 101 | MetaMath 13B | 22.5 | Yes | MetaMath: Bootstrap Your Own Mathematical Questi... | 2023-09-21 | Code |
| 102 | LLaMA 65B (maj1@k) | 20.5 | No | LLaMA: Open and Efficient Foundation Language Mo... | 2023-02-27 | Code |
| 103 | GAL 120B (5-shot) mCoT | 20.4 | No | Galactica: A Large Language Model for Science | 2022-11-16 | Code |
| 104 | MetaMath 7B | 19.4 | Yes | MetaMath: Bootstrap Your Own Mathematical Questi... | 2023-09-21 | Code |
| 105 | davinci-002 175B | 19.1 | No | Solving Quantitative Reasoning Problems with Lan... | 2022-06-29 | Code |
| 106 | Branch-Train-MiX 4x7B (sampling top-2 experts) | 17.8 | No | Branch-Train-MiX: Mixing Expert LLMs into a Mixt... | 2024-03-12 | Code |
| 107 | GAL 120B <work> | 16.6 | No | Galactica: A Large Language Model for Science | 2022-11-16 | Code |
| 108 | LLaMA 33B-maj1@k | 15.2 | No | LLaMA: Open and Efficient Foundation Language Mo... | 2023-02-27 | Code |
| 109 | Minerva 8B | 14.1 | No | Solving Quantitative Reasoning Problems with Lan... | 2022-06-29 | Code |
| 110 | WizardMath-13B-V1.0 | 14 | Yes | WizardMath: Empowering Mathematical Reasoning fo... | 2023-08-18 | Code |
| 111 | Mistral 7B (maj@4) | 13.1 | No | Mistral 7B | 2023-10-10 | Code |
| 112 | GAL 30B (5-shot) mCoT | 12.7 | No | Galactica: A Large Language Model for Science | 2022-11-16 | Code |
| 113 | Mistral 7B (maj@4) | 12.7 | No | Mixtral of Experts | 2024-01-08 | Code |
| 114 | GAL 30B <work> | 11.4 | No | Galactica: A Large Language Model for Science | 2022-11-16 | Code |
| 115 | WizardMath-7B-V1.0 | 10.7 | Yes | WizardMath: Empowering Mathematical Reasoning fo... | 2023-08-18 | Code |
| 116 | LLaMA 65B | 10.6 | No | LLaMA: Open and Efficient Foundation Language Mo... | 2023-02-27 | Code |
| 117 | PaLM 540B | 8.8 | No | Solving Quantitative Reasoning Problems with Lan... | 2022-06-29 | Code |
| 118 | PaLM 540B (5-shot) mCoT | 8.8 | No | Galactica: A Large Language Model for Science | 2022-11-16 | Code |
| 119 | LLaMA 13B-maj1@k | 8.8 | No | LLaMA: Open and Efficient Foundation Language Mo... | 2023-02-27 | Code |
| 120 | LLaMA 33B | 7.1 | No | LLaMA: Open and Efficient Foundation Language Mo... | 2023-02-27 | Code |
| 121 | LLaMA 7B-maj1@k | 6.9 | No | LLaMA: Open and Efficient Foundation Language Mo... | 2023-02-27 | Code |
| 122 | GPT-2 (1.5B) | 6.9 | No | Measuring Mathematical Problem Solving With the ... | 2021-03-05 | Code |
| 123 | GPT-2 (0.7B) | 6.4 | No | Measuring Mathematical Problem Solving With the ... | 2021-03-05 | Code |
| 124 | GPT-2 (0.3B) | 6.2 | No | Measuring Mathematical Problem Solving With the ... | 2021-03-05 | Code |
| 125 | GPT-3 13B | 5.6 | No | Measuring Mathematical Problem Solving With the ... | 2021-03-05 | Code |
| 126 | PaLM 8B (fine-tuned) | 5.6 | No | Solving Quantitative Reasoning Problems with Lan... | 2022-06-29 | Code |
| 127 | GPT-2 (0.1B) | 5.4 | No | Measuring Mathematical Problem Solving With the ... | 2021-03-05 | Code |
| 128 | GPT-3-175B (few-shot) | 5.2 | No | Measuring Mathematical Problem Solving With the ... | 2021-03-05 | Code |
| 129 | GPT-3 175B (8-shot) | 5.2 | No | Galactica: A Large Language Model for Science | 2022-11-16 | Code |
| 130 | PaLM 62B | 4.4 | No | Solving Quantitative Reasoning Problems with Lan... | 2022-06-29 | Code |
| 131 | LLaMA 13B | 3.9 | No | LLaMA: Open and Efficient Foundation Language Mo... | 2023-02-27 | Code |
| 132 | GPT-3-13B (few-shot) | 3 | No | Measuring Mathematical Problem Solving With the ... | 2021-03-05 | Code |
| 133 | LLaMA 7B | 2.9 | No | LLaMA: Open and Efficient Foundation Language Mo... | 2023-02-27 | Code |
| 134 | GPT-3 2.7B | 2.9 | No | Measuring Mathematical Problem Solving With the ... | 2021-03-05 | Code |
| 135 | PaLM 8B | 1.5 | No | Solving Quantitative Reasoning Problems with Lan... | 2022-06-29 | Code |