MuggleMATH 13B

Reported on 2 benchmarks across 1 task · 1 paper

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Reasoning2 results

Arithmetic ReasoningonGSM8K
Accuracy· uses extra data· 2023-10-09
74
best: 97.72 (Claude 3.5 Sonnet (HPT))
MuggleMath: Assessing the Impact of Query and Response Augmentation on Math Reasoning arXiv:2310.05506
Arithmetic ReasoningonGSM8K
Parameters (Billion)· uses extra data· 2023-10-09
13
best: 540 (PaLM 540B (Self Improvement, Self Consistency))
MuggleMath: Assessing the Impact of Query and Response Augmentation on Math Reasoning arXiv:2310.05506