TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Natural Language Processing/Common Sense Reasoning/WinoGrande

Common Sense Reasoning on WinoGrande

Metric: Accuracy (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Accuracy▼Extra DataPaperDate↕Code
1ST-MoE-32B 269B (fine-tuned)96.1NoST-MoE: Designing Stable and Transferable Sparse...2022-02-17Code
2Unicorn 11B (fine-tuned)91.3NoUNICORN on RAINBOW: A Universal Commonsense Reas...2021-03-24Code
3CompassMTL 567M with Tailor90.5NoTask Compass: Scaling Multi-task Pre-training wi...2022-10-12Code
4CompassMTL 567M89.6NoTask Compass: Scaling Multi-task Pre-training wi...2022-10-12Code
5UnifiedQA 11B (fine-tuned)89.4NoUnifiedQA: Crossing Format Boundaries With a Sin...2020-05-02Code
6Claude 3 Opus (5-shot)88.5No---
7GPT-4 (5-shot)87.5NoGPT-4 Technical Report2023-03-15Code
8ExDeBERTa 567M87NoTask Compass: Scaling Multi-task Pre-training wi...2022-10-12Code
9LLaMA-2 13B + MixLoRA86.3NoMixLoRA: Enhancing Large Language Models Fine-Tu...2024-04-22Code
10LLaMA3 8B+MoSLoRA85.8NoMixture-of-Subspaces in Low-Rank Adaptation2024-06-16Code
11PaLM 2-L (1-shot)83NoPaLM 2 Technical Report2023-05-17Code
12LLaMA-3 8B + MixLoRA82.1NoMixLoRA: Enhancing Large Language Models Fine-Tu...2024-04-22Code
13ST-MoE-L 4.1B (fine-tuned)81.7NoST-MoE: Designing Stable and Transferable Sparse...2022-02-17Code
14GPT-3.5 (5-shot)81.6NoGPT-4 Technical Report2023-03-15Code
15PaLM 540B (0-shot)81.1NoPaLM: Scaling Language Modeling with Pathways2022-04-05Code
16Camelidae-8×34B80.9NoParameter-Efficient Sparsity Crafting from Dense...2024-01-05Code
17PaLM 2-M (1-shot)79.2NoPaLM 2 Technical Report2023-05-17Code
18RoBERTa-Winogrande 355M (fine-tuned)79.1NoWinoGrande: An Adversarial Winograd Schema Chall...2019-07-24Code
19PaLM 2-S (1-shot)77.9NoPaLM 2 Technical Report2023-05-17Code
20Mixtral 8x7B (0-shot)77.2NoMixtral of Experts2024-01-08Code
21PaLM 62B (0-shot)77NoPaLM: Scaling Language Modeling with Pathways2022-04-05Code
22PaLM-cont 62B (0-shot)77NoPaLM: Scaling Language Modeling with Pathways2022-04-05Code
23LLaMA 65B (0-shot)77NoLLaMA: Open and Efficient Foundation Language Mo...2023-02-27Code
24LLaMA-2 7B + MixLoRA76.8NoMixLoRA: Enhancing Large Language Models Fine-Tu...2024-04-22Code
25LLaMA 33B (0-shot)76NoLLaMA: Open and Efficient Foundation Language Mo...2023-02-27Code
26Mistral 7B (0-shot)75.3NoMistral 7B2023-10-10Code
27Claude 3 Sonnet (5-shot)75.1No---
28Chinchilla 70B (0-shot)74.9NoTraining Compute-Optimal Large Language Models2022-03-29Code
29Claude 3 Haiku (5-shot)74.2No---
30Mistral 7B (0-shot)74.2NoMixtral of Experts2024-01-08Code
31phi-1.5-web 1.3B (zero-shot)74NoTextbooks Are All You Need II: phi-1.5 technical...2023-09-11Code
32Unified QA 406M (fine-tuned)73.3NoUnifiedQA: Crossing Format Boundaries With a Sin...2020-05-02Code
33LLaMA 13B (0-shot)73NoLLaMA: Open and Efficient Foundation Language Mo...2023-02-27Code
34FLAN 137B (few-shot, k=16)72.8NoFinetuned Language Models Are Zero-Shot Learners2021-09-03Code
35G-DAUG-Combo + RoBERTa-Large71.4NoGenerative Data Augmentation for Commonsense Rea...2020-04-24Code
36FLAN 137B (0-shot)71.2NoFinetuned Language Models Are Zero-Shot Learners2021-09-03Code
37RWKV v5 Eagle 7B70.8No---
38Branch-Train-MiX 4x7B (sampling top-1 expert)70.6NoBranch-Train-MiX: Mixing Expert LLMs into a Mixt...2024-03-12Code
39GPT-3 175B (0-shot)70.2NoLanguage Models are Few-Shot Learners2020-05-28Code
40Gopher 280B (0-shot)70.1NoScaling Language Models: Methods, Analysis & Ins...2021-12-08Code
41LLaMA 7B (0-shot)70.1NoLLaMA: Open and Efficient Foundation Language Mo...2023-02-27Code
42BLOOM 176B (1-shot)67NoBloombergGPT: A Large Language Model for Finance2023-03-30Code
43Pythia 12B (5-shot)66.6NoPythia: A Suite for Analyzing Large Language Mod...2023-04-03Code
44OPT 66B (1-shot)66.1NoBloombergGPT: A Large Language Model for Finance2023-03-30Code
45BERT-Winogrande 345M (fine-tuned)64.9NoWinoGrande: An Adversarial Winograd Schema Chall...2019-07-24Code
46Bloomberg GPT (one-shot)64.1NoBloombergGPT: A Large Language Model for Finance2023-03-30Code
47Pythia 12B (0-shot)63.9NoPythia: A Suite for Analyzing Large Language Mod...2023-04-03Code
48RoE-3B61.6NoExploring the Benefits of Training Expert Langua...2023-02-07Code
49Pythia 6.9B (0-shot)60.9NoPythia: A Suite for Analyzing Large Language Mod...2023-04-03Code
50GPT-NeoX (one-shot)60.6NoBloombergGPT: A Large Language Model for Finance2023-03-30Code
51FLAN-T5-Large 783M59.9NoLaMini-LM: A Diverse Herd of Distilled Models fr...2023-04-27Code
52Pythia 2.8B (0-shot)59.4NoPythia: A Suite for Analyzing Large Language Mod...2023-04-03Code
53RoBERTa-DPR 355M (0-shot)58.9NoWinoGrande: An Adversarial Winograd Schema Chall...2019-07-24Code
54ALBERT-xxlarge 235M58.7NoBack to Square One: Artifact Detection, Training...2021-04-16-
55Flipped-3B58.56NoGuess the Instruction! Flipped Learning Makes La...2022-10-06Code
56GPT-2-XL 1.5B58.3NoLaMini-LM: A Diverse Herd of Distilled Models fr...2023-04-27Code
57T0-3B (CoT fine-tuned)57.5NoThe CoT Collection: Improving Zero-shot and Few-...2023-05-23Code
58GPT-3 Large 760M (0-shot)57.4NoLanguage Models are Few-Shot Learners2020-05-28Code
59RoBERTa-base 125M56.3NoBack to Square One: Artifact Detection, Training...2021-04-16-
60LaMini-F-T5 783M56NoLaMini-LM: A Diverse Herd of Distilled Models fr...2023-04-27Code
61LaMini-GPT 1.5B56NoLaMini-LM: A Diverse Herd of Distilled Models fr...2023-04-27Code
62BERT-large 345M55.6NoBack to Square One: Artifact Detection, Training...2021-04-16-
63KiC-770M55.3NoKnowledge-in-Context: Towards Knowledgeable Semi...2022-10-28-
64T5-Large 738M55.2NoLaMini-LM: A Diverse Herd of Distilled Models fr...2023-04-27Code
65LaMini-T5 738M54.9NoLaMini-LM: A Diverse Herd of Distilled Models fr...2023-04-27Code
66RoBERTa-large 355M54.9NoBack to Square One: Artifact Detection, Training...2021-04-16-
67sMLP – deterministic 9.4B (0-shot)54.3NoEfficient Language Modeling with Sparse all-MLP2022-03-14-
68Switch Transformer 9B (0-shot)53.4NoEfficient Language Modeling with Sparse all-MLP2022-03-14-
69BERT-base 110M53.1NoBack to Square One: Artifact Detection, Training...2021-04-16-
70ALBERT-base 11M52.8NoBack to Square One: Artifact Detection, Training...2021-04-16-
71BERT-large 345M (0-shot)51.9NoWinoGrande: An Adversarial Winograd Schema Chall...2019-07-24Code
72HASH Layers 10B (0-shot)51.7NoEfficient Language Modeling with Sparse all-MLP2022-03-14-
73Gshard 9B (0-shot)51.1NoEfficient Language Modeling with Sparse all-MLP2022-03-14-
74Base Layers 10B (0-shot)51NoEfficient Language Modeling with Sparse all-MLP2022-03-14-
75BERT-DPR 345M (0-shot)51NoWinoGrande: An Adversarial Winograd Schema Chall...2019-07-24Code
76Random baseline50NoBack to Square One: Artifact Detection, Training...2021-04-16-
77RoBERTa-large 355M (0-shot)50NoWinoGrande: An Adversarial Winograd Schema Chall...2019-07-24Code