TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Knowledge Base/Mathematical Question Answering/MAWPS

Mathematical Question Answering on MAWPS

Metric: Accuracy (%) (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Accuracy (%)▼Extra DataPaperDate↕Code
1OpenMath-CodeLlama-70B (w/ code)95.7YesOpenMathInstruct-1: A 1.8 Million Math Instructi...2024-02-15Code
2MsAT-DeductReasoner94.3NoLearning Multi-Step Reasoning by Solving Arithme...2023-06-02Code
3ATHENA (roberta-large)93NoATHENA: Mathematical Reasoning with Thought Expa...2023-11-02Code
4Multi-view92.3YesMulti-View Reasoning: Consistent Contrastive Lea...2022-10-21Code
5Exp-Tree92.3NoAn Expression Tree Decoding Strategy for Mathema...2023-10-14Code
6ATHENA (roberta-base)92.2NoATHENA: Mathematical Reasoning with Thought Expa...2023-11-02Code
7Roberta-DeductReasoner92NoLearning to Reason Deductively: Math Word Proble...2022-03-19Code
8DeBERTa (PM + VM)91YesMath Word Problem Solving by Generating Linguist...2023-06-24Code
9EPT88.7No--Code
10Graph2Tree with RoBERTa88.7NoAre NLP Models really able to Solve Simple Math ...2021-03-12Code
11GTS with RoBERTa88.5NoAre NLP Models really able to Solve Simple Math ...2021-03-12Code
12GEO85.1No---
13EPT-X84.57No--Code
14EPT84.51No--Code
15Graph2Tree83.7No--Code
16LLaMA 2-Chat82.4NoLlama 2: Open Foundation and Fine-Tuned Chat Mod...2023-07-18Code
17GPT-3.5 turbo (175B)80.3NoMath Word Problem Solving by Generating Linguist...2023-06-24Code
18Toolformer44No---
19GPT-3 (175B)19.8No---
20Toolformer (disabled)15No---
21GPT-J9.9NoMath Word Problem Solving by Generating Linguist...2023-06-24Code
22GPT-J + CC9.3No---
23OPT (66B)7.9No---
24GPT-3 text-curie-001 (13B)4.09NoMath Word Problem Solving by Generating Linguist...2023-06-24Code
25GPT-3 text-babbage-001 (6.7B)2.76NoMath Word Problem Solving by Generating Linguist...2023-06-24Code