Common Sense Reasoning on BIG-bench (Known Unknowns)

Metric: Accuracy (higher is better)

LeaderboardDataset
Loading chart...
#ModelAccuracyExtra DataPaperDateCode
1PaLM-540B (few-shot, k=5)73.9NoPaLM: Scaling Language Modeling with Pathways2022-04-05Code
2Chinchilla-70B (few-shot, k=5)65.2NoTraining Compute-Optimal Large Language Models2022-03-29Code
3Gopher-280B (few-shot, k=5)63.6NoScaling Language Models: Methods, Analysis & Ins...2021-12-08Code