Phi-GSM 2.7B (fine-tuned)

Reported on 2 benchmarks across 1 task · 1 paper

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Reasoning2 results

Arithmetic ReasoningonGSM8K
Accuracy· 2023-12-14
74.3
best: 97.72 (Claude 3.5 Sonnet (HPT))
TinyGSM: achieving >80% on GSM8k with small language models arXiv:2312.09241
Arithmetic ReasoningonGSM8K
Parameters (Billion)· 2023-12-14
2.7
best: 540 (PaLM 540B (Self Improvement, Self Consistency))
TinyGSM: achieving >80% on GSM8k with small language models arXiv:2312.09241