GEO

Reported on 12 benchmarks across 4 tasks

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Knowledge Base6 results

Mathematical Question AnsweringonMAWPS
Accuracy (%)
85.1
best: 95.7 (OpenMath-CodeLlama-70B (w/ code))
Mathematical Question AnsweringonALG514
Accuracy (%)
82.1
best: 83 (MixedSP)
Mathematical Question AnsweringonDRAW-1K
Accuracy (%)
62.5
best: 63.5 (EPT)
Mathematical ReasoningonMAWPS
Accuracy (%)
85.1
best: 95.7 (OpenMath-CodeLlama-70B (w/ code))
Mathematical ReasoningonALG514
Accuracy (%)
82.1
best: 83 (MixedSP)
Mathematical ReasoningonDRAW-1K
Accuracy (%)
62.5
best: 63.5 (EPT)

Question AnsweringonMAWPS
Accuracy (%)
85.1
best: 95.7 (OpenMath-CodeLlama-70B (w/ code))
Question AnsweringonALG514
Accuracy (%)
82.1
best: 83 (MixedSP)
Question AnsweringonDRAW-1K
Accuracy (%)
62.5
best: 63.5 (EPT)

Math Word Problem SolvingonMAWPS
Accuracy (%)
85.1
best: 95.7 (OpenMath-CodeLlama-70B (w/ code))
Math Word Problem SolvingonALG514
Accuracy (%)
82.1
best: 83 (MixedSP)
Math Word Problem SolvingonDRAW-1K
Accuracy (%)
62.5
best: 63.5 (EPT)