TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Natural Language Processing/Common Sense Reasoning/BIG-bench

Common Sense Reasoning on BIG-bench

Metric: Accuracy (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Accuracy▼Extra DataPaperDate↕Code
1Orca 2-13B86.86NoOrca 2: Teaching Small Language Models How to Re...2023-11-18-
2Chinchilla-70B (few-shot, k=5)85.7NoTraining Compute-Optimal Large Language Models2022-03-29Code
3Orca 2-7B84.31NoOrca 2: Teaching Small Language Models How to Re...2023-11-18-
4Chinchilla-70B (few-shot, k=5)75NoTraining Compute-Optimal Large Language Models2022-03-29Code
5Chinchilla-70B (few-shot, k=5)73NoTraining Compute-Optimal Large Language Models2022-03-29Code
6Gopher-280B (few-shot, k=5)69.7NoScaling Language Models: Methods, Analysis & Ins...2021-12-08Code
7Chinchilla-70B (few-shot, k=5)68.8NoTraining Compute-Optimal Large Language Models2022-03-29Code
8Gopher-280B (few-shot, k=5)68.2NoScaling Language Models: Methods, Analysis & Ins...2021-12-08Code
9Chinchilla-70B (few-shot, k=5)67.7NoTraining Compute-Optimal Large Language Models2022-03-29Code
10Gopher-280B (few-shot, k=5)56.8NoScaling Language Models: Methods, Analysis & Ins...2021-12-08Code
11Gopher-280B (few-shot, k=5)52.5NoScaling Language Models: Methods, Analysis & Ins...2021-12-08Code
12Gopher-280B (few-shot, k=5)50.9NoScaling Language Models: Methods, Analysis & Ins...2021-12-08Code
13Chinchilla-70B (few-shot, k=5)13.1NoTraining Compute-Optimal Large Language Models2022-03-29Code
14Gopher-280B (few-shot, k=5)11.7NoScaling Language Models: Methods, Analysis & Ins...2021-12-08Code