Question Answering on StrategyQA

Metric: Accuracy (higher is better)

LeaderboardDataset

Loading chart...

Results

Submit a result

Sort:

#	Model↕	Accuracy▼	Extra Data	Paper	Date↕	Code
1	PaLM 2 (few-shot, CoT, SC)	90.4	No	PaLM 2 Technical Report	2023-05-17	Code
2	Rethinking with retrieval (GPT-3)	77.73	No	Rethinking with Retrieval: Faithful Large Langua...	2022-12-31	Code
3	Self-Evaluation Guided Decoding (Codex, CoT, single reasoning chain, 6-shot gen, 4-shot eval)	77.2	No	-	-	-
4	U-PaLM 540B	76.6	No	Transcending Scaling Laws with 0.1% Extra Compute	2022-10-20	-
5	PaLM 540B	76.4	No	Transcending Scaling Laws with 0.1% Extra Compute	2022-10-20	-
6	Minerva 540B	61.9	No	Transcending Scaling Laws with 0.1% Extra Compute	2022-10-20	-

#1PaLM 2 (few-shot, CoT, SC)SOTA
90.4
Accuracy· 2023-05-17
PaLM 2 Technical Report Code
#2Rethinking with retrieval (GPT-3)SOTA
77.73
Accuracy· 2022-12-31
Rethinking with Retrieval: Faithful Large Language Model Inference Code
#3Self-Evaluation Guided Decoding (Codex, CoT, single reasoning chain, 6-shot gen, 4-shot eval)
77.2
Accuracy
No paper
#4U-PaLM 540BSOTA
76.6
Accuracy· 2022-10-20
Transcending Scaling Laws with 0.1% Extra Compute
#5PaLM 540B
76.4
Accuracy· 2022-10-20
Transcending Scaling Laws with 0.1% Extra Compute
#6Minerva 540B
61.9
Accuracy· 2022-10-20
Transcending Scaling Laws with 0.1% Extra Compute