TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Natural Language Processing/Coreference Resolution/Winograd Schema Challenge

Coreference Resolution on Winograd Schema Challenge

Metric: Accuracy (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Accuracy▼Extra DataPaperDate↕Code
1PaLM 540B (fine-tuned)100NoPaLM: Scaling Language Modeling with Pathways2022-04-05Code
2Vega v2 6B (KD-based prompt transfer)98.6NoToward Efficient Language Model Pretraining and ...2022-12-04-
3UL2 20B (fine-tuned)98.1NoUL2: Unifying Language Learning Paradigms2022-05-10Code
4Turing NLR v5 XXL 5.4B (fine-tuned)97.3NoToward Efficient Language Model Pretraining and ...2022-12-04-
5ST-MoE-32B 269B (fine-tuned)96.6NoST-MoE: Designing Stable and Transferable Sparse...2022-02-17Code
6DeBERTa-1.5B95.9NoDeBERTa: Decoding-enhanced BERT with Disentangle...2020-06-05Code
7T5-XXL 11B (fine-tuned)93.8NoExploring the Limits of Transfer Learning with a...2019-10-23Code
8ST-MoE-L 4.1B (fine-tuned)93.3NoST-MoE: Designing Stable and Transferable Sparse...2022-02-17Code
9RoBERTa-WinoGrande 355M90.1NoWinoGrande: An Adversarial Winograd Schema Chall...2019-07-24Code
10Flan-T5 XXL (zero -shot)89.82NoScaling Instruction-Finetuned Language Models2022-10-20Code
11PaLM 540B (5-shot)89.5NoPaLM: Scaling Language Modeling with Pathways2022-04-05Code
12PaLM 540B (0-shot)89.1NoPaLM: Scaling Language Modeling with Pathways2022-04-05Code
13PaLM 2-M (1-shot)88.1NoPaLM 2 Technical Report2023-05-17Code
14PaLM 2-L (1-shot)86.9NoPaLM 2 Technical Report2023-05-17Code
15FLAN 137B (prompt-tuned)86.5NoFinetuned Language Models Are Zero-Shot Learners2021-09-03Code
16PaLM 540B (1-shot)86.3NoPaLM: Scaling Language Modeling with Pathways2022-04-05Code
17TTTTT 3B (fine-tuned)84.6NoTTTTTackling WinoGrande Schemas2020-03-18-
18PaLM 2-S (1-shot)84.6NoPaLM 2 Technical Report2023-05-17Code
19RoBERTa-DPR 355M83.1NoWinoGrande: An Adversarial Winograd Schema Chall...2019-07-24Code
20FLAN 137B (zero-shot)80.8NoFinetuned Language Models Are Zero-Shot Learners2021-09-03Code
21GPT-3 175B (few-shot)80.1NoLanguage Models are Few-Shot Learners2020-05-28Code
22RoBERTa-large + G-DAug-Inf80NoGenerative Data Augmentation for Commonsense Rea...2020-04-24Code
23UL2 20B (0-shot)79.9NoUL2: Unifying Language Learning Paradigms2022-05-10Code
24ALBERT-xxlarge 235M78.8NoBack to Square One: Artifact Detection, Training...2021-04-16-
25Neo-6B (QA + WS)77.9NoAsk Me Anything: A simple strategy for prompting...2022-10-05Code
26HNN75.1NoA Hybrid Neural Network Model for Commonsense Re...2019-07-27Code
27Neo-6B (QA)74.7NoAsk Me Anything: A simple strategy for prompting...2022-10-05Code
28RoBERTa-large 354M73.9NoBack to Square One: Artifact Detection, Training...2021-04-16-
29GPT-2-XL 1.5B73.3NoLaMini-LM: A Diverse Herd of Distilled Models fr...2023-04-27Code
30BERTwiki 340M (fine-tuned on WSCR)72.5NoA Surprisingly Robust Trick for Winograd Schema ...2019-05-15Code
31BERT-SocialIQA 340M72.5NoSocialIQA: Commonsense Reasoning about Social In...2019-04-22Code
32BERT-large 340M (fine-tuned on WSCR)71.4NoA Surprisingly Robust Trick for Winograd Schema ...2019-05-15Code
33GPT-2-XL 1.5B70.7No--Code
34BERTwiki 340M (fine-tuned on half of WSCR)70.3NoA Surprisingly Robust Trick for Winograd Schema ...2019-05-15Code
35LaMini-GPT 1.5B69.6NoLaMini-LM: A Diverse Herd of Distilled Models fr...2023-04-27Code
36GPT-2 Medium 774M (partial scoring)69.2NoHow Reasonable are Common-Sense Reasoning Tasks:...2018-11-05Code
37N-Grammer 343M68.3NoN-Grammer: Augmenting Transformers with latent n...2022-07-13Code
38AlexaTM 20B68.3NoAlexaTM 20B: Few-Shot Learning Using a Large-Sca...2022-08-02Code
39BERT-large 340M67NoSocialIQA: Commonsense Reasoning about Social In...2019-04-22Code
40T5-Large 738M66.7NoLaMini-LM: A Diverse Herd of Distilled Models fr...2023-04-27Code
41T0-3B (CoT fine-tuned)66NoThe CoT Collection: Improving Zero-shot and Few-...2023-05-23Code
42KiC-770M65.4NoKnowledge-in-Context: Towards Knowledgeable Semi...2022-10-28-
43GPT-2 Medium 774M (full scoring)64.5NoHow Reasonable are Common-Sense Reasoning Tasks:...2018-11-05Code
44LaMini-F-T5 783M64.1NoLaMini-LM: A Diverse Herd of Distilled Models fr...2023-04-27Code
45Ensemble of 14 LMs63.7NoA Simple Method for Commonsense Reasoning2018-06-07Code
46H3 125M (3-shot, rank classification)63.5NoHungry Hungry Hippos: Towards Language Modeling ...2022-12-28Code
47DSSM63NoUnsupervised Deep Structured Semantic Models for...2019-04-03-
48RoBERTa-base 125M63NoBack to Square One: Artifact Detection, Training...2021-04-16-
49Word-level CNN+LSTM (partial scoring)62.6NoA Simple Method for Commonsense Reasoning2018-06-07Code
50UDSSM-II (ensemble)62.4NoUnsupervised Deep Structured Semantic Models for...2019-04-03-
51BERT-base 110M (fine-tuned on WSCR)62.3NoA Surprisingly Robust Trick for Winograd Schema ...2019-05-15Code
52RoE-3B62.21NoExploring the Benefits of Training Expert Langua...2023-02-07Code
53BERT-large 340M62NoBERT: Pre-training of Deep Bidirectional Transfo...2018-10-11Code
54GPT-2 Small 117M (partial scoring)61.5NoHow Reasonable are Common-Sense Reasoning Tasks:...2018-11-05Code
55H3 125M (0-shot, rank classification)61.5NoHungry Hungry Hippos: Towards Language Modeling ...2022-12-28Code
56BERT-large 340M61.4NoBack to Square One: Artifact Detection, Training...2021-04-16-
57BERT-base 110M + MAS60.3NoAttention Is (not) All You Need for Commonsense ...2019-05-31Code
58longdoc S (OntoNotes + PreCo + LitBank)60.1NoOn Generalization in Coreference Resolution2021-09-20Code
59longdoc S (ON + PreCo + LitBank + 30k pseudo-singletons)59.4NoOn Generalization in Coreference Resolution2021-09-20Code
60UDSSM-II59.2NoUnsupervised Deep Structured Semantic Models for...2019-04-03-
61LaMini-T5 738M59NoLaMini-LM: A Diverse Herd of Distilled Models fr...2023-04-27Code
62Flipped-3B58.37NoGuess the Instruction! Flipped Learning Makes La...2022-10-06Code
63KEE+NKAM winner of the WSC201658.3NoCommonsense Knowledge Enhanced Embeddings for So...2016-11-13-
64Char-level CNN+LSTM (partial scoring)57.9NoA Simple Method for Commonsense Reasoning2018-06-07Code
65UDSSM-I (ensemble)57.1NoUnsupervised Deep Structured Semantic Models for...2019-04-03-
66Knowledge Hunter57.1NoA Knowledge Hunting Framework for Common Sense R...2018-10-02-
67WKH57.1NoWinoGrande: An Adversarial Winograd Schema Chall...2019-07-24Code
68BERT-base 110M56.5NoBack to Square One: Artifact Detection, Training...2021-04-16-
69GPT-2 Small 117M (full scoring)55.7NoHow Reasonable are Common-Sense Reasoning Tasks:...2018-11-05Code
70ALBERT-base 11M55.4NoBack to Square One: Artifact Detection, Training...2021-04-16-
71Pythia 12B (0-shot)54.8NoPythia: A Suite for Analyzing Large Language Mod...2023-04-03Code
72UDSSM-I54.5NoUnsupervised Deep Structured Semantic Models for...2019-04-03-
73Subword-level Transformer LM54.1NoAttention Is All You Need2017-06-12Code
74USSM + Supervised DeepNet + KB52.8NoAttention Is (not) All You Need for Commonsense ...2019-05-31Code
75KEE+NKAM on WinoGrande52.8NoWinoGrande: An Adversarial Winograd Schema Chall...2019-07-24Code
76USSM + KB52NoAttention Is (not) All You Need for Commonsense ...2019-05-31Code
77Random chance baseline50NoBack to Square One: Artifact Detection, Training...2021-04-16-
78Hybrid H3 125M (3-shot, logit scoring)43.3NoHungry Hungry Hippos: Towards Language Modeling ...2022-12-28Code
79Pythia 2.8B (0-shot)38.5NoPythia: A Suite for Analyzing Large Language Mod...2023-04-03Code
80Neo-6B (few-shot)36.5NoAsk Me Anything: A simple strategy for prompting...2022-10-05Code
81Pythia 6.9B (0-shot)36.5NoPythia: A Suite for Analyzing Large Language Mod...2023-04-03Code
82Pythia 12B (5-shot)36.5NoPythia: A Suite for Analyzing Large Language Mod...2023-04-03Code