Tasks SotA Datasets Papers Methods Submit About

Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable Benchmarks All SotA Datasets Papers Methods

Community

Submit Results About

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/PaLM 540B (Self Improvement, Standard-Prompting)

PaLM 540B (Self Improvement, Standard-Prompting)

Reported on 7 benchmarks across 4 tasks · 1 paper

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing5 results

Question AnsweringonDROP
Accuracy· 2022-10-20
71.7
best: 83 (PaLM 540B (Self Improvement, Self Consistency))
Large Language Models Can Self-Improve arXiv:2210.11610
Question AnsweringonOpenBookQA
Accuracy· 2022-10-20
92
best: 95.9 (GPT-4 + knowledge base)
Large Language Models Can Self-Improve arXiv:2210.11610
Common Sense ReasoningonARC (Challenge)
Accuracy· 2022-10-20
87.2
best: 96.4 (GPT-4 (few-shot, k=25))
Large Language Models Can Self-Improve arXiv:2210.11610
Natural Language InferenceonANLI test
A2· 2022-10-20
64.8
best: 72.5 (T5-3B (explanation prompting))
Large Language Models Can Self-Improve arXiv:2210.11610
Natural Language InferenceonANLI test
A3· 2022-10-20
66.9
best: 74.8 (T5-3B (explanation prompting))
Large Language Models Can Self-Improve arXiv:2210.11610

Reasoning2 results

Arithmetic ReasoningonGSM8K
Accuracy· 2022-10-20
32.2
best: 97.72 (Claude 3.5 Sonnet (HPT))
Large Language Models Can Self-Improve arXiv:2210.11610
Arithmetic ReasoningonGSM8K
Parameters (Billion)· 2022-10-20
540
Large Language Models Can Self-Improve arXiv:2210.11610