TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Reasoning/Math Word Problem Solving/SVAMP

Math Word Problem Solving on SVAMP

Metric: Execution Accuracy (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Execution Accuracy▼Extra DataPaperDate↕Code
1GPT-4 (Teaching-Inspired)93.9NoTeaching-Inspired Integrated Prompting Framework...2024-10-10Code
2GPT-4 (Model Selection)93.7NoAutomatic Model Selection with Large Language Mo...2023-05-23Code
3Qwen2(CoT + Code Interpreter)92.3No---
4GPT-4 (PHP)91.9NoProgressive-Hint Prompting Improves Reasoning in...2023-04-19Code
5OpenMath-CodeLlama-70B (w/ code)87.8YesOpenMathInstruct-1: A 1.8 Million Math Instructi...2024-02-15Code
6MathCoder-L-70B84.9YesMathCoder: Seamless Code Integration in LLMs for...2023-10-05Code
7PoT_Eng (self-consistency @ 5)83.7No--Code
8CoT_Eng (self-consistency @ 5)82.5No--Code
9MMOS-CODE-34B(0-shot)80.6YesAn Empirical Study of Data Ability Boundary in L...2024-02-23Code
10MMOS-DeepSeekMath-7B(0-shot)79.3YesAn Empirical Study of Data Ability Boundary in L...2024-02-23Code
11MMOS-CODE-7B(0-shot)76.4YesAn Empirical Study of Data Ability Boundary in L...2024-02-23Code
12LLaMA 2-Chat69.2NoLlama 2: Open Foundation and Fine-Tuned Chat Mod...2023-07-18Code
13DeBERTa63.5NoMath Word Problem Solving by Generating Linguist...2023-06-24Code
14PaLM (zero-shot, CoT)62.1NoLarge Language Models are Zero-Shot Reasoners2022-05-24Code
15PaLM (zero-shot)58.8NoLarge Language Models are Zero-Shot Reasoners2022-05-24Code
16SYRELM (Vicuna 13B)56.65YesFrugal LMs Trained to Invoke Symbolic Solvers Ac...2023-12-09Code
17ATHENA (roberta-large)54.8NoATHENA: Mathematical Reasoning with Thought Expa...2023-11-02Code
18MsAT-DeductReasoner48.9NoLearning Multi-Step Reasoning by Solving Arithme...2023-06-02Code
19Roberta-DeductReasoner47.3NoLearning to Reason Deductively: Math Word Proble...2022-03-19Code
20ATHENA (roberta-base)45.6NoATHENA: Mathematical Reasoning with Thought Expa...2023-11-02Code
21Graph2Tree with RoBERTa43.8YesAre NLP Models really able to Solve Simple Math ...2021-03-12Code
22GTS with RoBERTa41YesAre NLP Models really able to Solve Simple Math ...2021-03-12Code
23LSTM Seq2Seq with RoBERTa40.3YesAre NLP Models really able to Solve Simple Math ...2021-03-12Code
24SYRELM (GPT-J)40.1YesFrugal LMs Trained to Invoke Symbolic Solvers Ac...2023-12-09Code
25Transformer with RoBERTa38.9YesAre NLP Models really able to Solve Simple Math ...2021-03-12Code