Mathematical Reasoning on Lila (IID)

Metric: Accuracy (higher is better)

LeaderboardDataset

Loading chart...

Results

Submit a result

Sort:

#	Model↕	Accuracy▼	Extra Data	Paper	Date↕	Code
1	Codex (Few-Shot, 175B)	0.604	No	Lila: A Unified Benchmark for Mathematical Reaso...	2022-10-31	Code
2	Bhāskara-P (Fine-tuned, 2.7B)	0.48	No	Lila: A Unified Benchmark for Mathematical Reaso...	2022-10-31	Code
3	Neo-P (Fine-tuned, 2.7B)	0.394	No	Lila: A Unified Benchmark for Mathematical Reaso...	2022-10-31	Code
4	GPT-3 (Few-Shot, 175B)	0.384	No	Lila: A Unified Benchmark for Mathematical Reaso...	2022-10-31	Code
5	Bhāskara-A (Fine-tuned, 2.7B)	0.252	No	Lila: A Unified Benchmark for Mathematical Reaso...	2022-10-31	Code
6	Neo-A (Fine-tuned, 2.7B)	0.204	No	Lila: A Unified Benchmark for Mathematical Reaso...	2022-10-31	Code

#1Codex (Few-Shot, 175B)SOTA
0.604
Accuracy· 2022-10-31
Lila: A Unified Benchmark for Mathematical Reasoning Code
#2Bhāskara-P (Fine-tuned, 2.7B)
0.48
Accuracy· 2022-10-31
Lila: A Unified Benchmark for Mathematical Reasoning Code
#3Neo-P (Fine-tuned, 2.7B)
0.394
Accuracy· 2022-10-31
Lila: A Unified Benchmark for Mathematical Reasoning Code
#4GPT-3 (Few-Shot, 175B)
0.384
Accuracy· 2022-10-31
Lila: A Unified Benchmark for Mathematical Reasoning Code
#5Bhāskara-A (Fine-tuned, 2.7B)
0.252
Accuracy· 2022-10-31
Lila: A Unified Benchmark for Mathematical Reasoning Code
#6Neo-A (Fine-tuned, 2.7B)
0.204
Accuracy· 2022-10-31
Lila: A Unified Benchmark for Mathematical Reasoning Code