PaLM 2-M (1-shot)

Reported on 9 benchmarks across 4 tasks · 1 paper

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing9 results

Question AnsweringonCOPA
Accuracy· 2023-05-17
90
best: 100 (PaLM 540B (finetuned) )
PaLM 2 Technical Report arXiv:2305.10403
Question AnsweringonPIQA
Accuracy· 2023-05-17
83.2
best: 90.1 (Unicorn 11B (fine-tuned))
PaLM 2 Technical Report arXiv:2305.10403
Question AnsweringonBoolQ
Accuracy· 2023-05-17
88.6
best: 99.87 (Mistral-Nemo 12B (HPT))
PaLM 2 Technical Report arXiv:2305.10403
Question AnsweringonOpenBookQA
Accuracy· 2023-05-17
56.2
best: 95.9 (GPT-4 + knowledge base)
PaLM 2 Technical Report arXiv:2305.10403
Common Sense ReasoningonWinoGrande
Accuracy· 2023-05-17
79.2
best: 96.1 (ST-MoE-32B 269B (fine-tuned))
PaLM 2 Technical Report arXiv:2305.10403
Common Sense ReasoningonARC (Challenge)
Accuracy· 2023-05-17
64.9
best: 96.4 (GPT-4 (few-shot, k=25))
PaLM 2 Technical Report arXiv:2305.10403
Common Sense ReasoningonARC (Easy)
Accuracy· 2023-05-17
88
best: 95.2 (ST-MoE-32B 269B (fine-tuned))
PaLM 2 Technical Report arXiv:2305.10403
Coreference ResolutiononWinograd Schema Challenge
Accuracy· 2023-05-17
88.1
best: 100 (PaLM 540B (fine-tuned))
PaLM 2 Technical Report arXiv:2305.10403
Sentence CompletiononHellaSwag
Accuracy· 2023-05-17
86.7
best: 96.1 (CompassMTL 567M with Tailor)
PaLM 2 Technical Report arXiv:2305.10403