GPT-3.5 turbo (175B)

Reported on 4 benchmarks across 4 tasks · 1 paper

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Knowledge Base2 results

Mathematical Question AnsweringonMAWPS
Accuracy (%)· 2023-06-24
80.3
best: 95.7 (OpenMath-CodeLlama-70B (w/ code))
Math Word Problem Solving by Generating Linguistic Variants of Problem Statements arXiv:2306.13899
Mathematical ReasoningonMAWPS
Accuracy (%)· 2023-06-24
80.3
best: 95.7 (OpenMath-CodeLlama-70B (w/ code))
Math Word Problem Solving by Generating Linguistic Variants of Problem Statements arXiv:2306.13899

Natural Language Processing1 result

Question AnsweringonMAWPS
Accuracy (%)· 2023-06-24
80.3
best: 95.7 (OpenMath-CodeLlama-70B (w/ code))
Math Word Problem Solving by Generating Linguistic Variants of Problem Statements arXiv:2306.13899

Reasoning1 result

Math Word Problem SolvingonMAWPS
Accuracy (%)· 2023-06-24
80.3
best: 95.7 (OpenMath-CodeLlama-70B (w/ code))
Math Word Problem Solving by Generating Linguistic Variants of Problem Statements arXiv:2306.13899