Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Question Answering
/
COPA
Question Answering on COPA
Metric: Accuracy (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
Sort:
Accuracy (best first)
Accuracy (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
Accuracy
▼
Extra Data
Paper
Date
↕
Code
1
PaLM 540B (finetuned)
100
No
PaLM: Scaling Language Modeling with Pathways
2022-04-05
Code
2
Vega v2 6B (KD-based prompt transfer)
99.4
No
Toward Efficient Language Model Pretraining and ...
2022-12-04
-
3
ST-MoE-32B 269B (fine-tuned)
99.2
No
ST-MoE: Designing Stable and Transferable Sparse...
2022-02-17
Code
4
UL2 20B (fine-tuned)
99
No
UL2: Unifying Language Learning Paradigms
2022-05-10
Code
5
DeBERTa-Ensemble
98.4
No
DeBERTa: Decoding-enhanced BERT with Disentangle...
2020-06-05
Code
6
Turing NLR v5 XXL 5.4B (fine-tuned)
98.2
No
Toward Efficient Language Model Pretraining and ...
2022-12-04
-
7
DeBERTa-1.5B
96.8
No
DeBERTa: Decoding-enhanced BERT with Disentangle...
2020-06-05
Code
8
PaLM 2-L (1-shot)
96
No
PaLM 2 Technical Report
2023-05-17
Code
9
T5-XXL 11B (fine-tuned)
94.8
No
Exploring the Limits of Transfer Learning with a...
2019-10-23
Code
10
FLAN 137B (prompt-tuned)
94
No
Finetuned Language Models Are Zero-Shot Learners
2021-09-03
Code
11
GPT-3 175B (few-shot, k=32)
92
No
Language Models are Few-Shot Learners
2020-05-28
Code
12
T5-XL 3B (fine-tuned)
92
No
Exploring the Limits of Transfer Learning with a...
2019-10-23
Code
13
FLAN 137B (zero-shot)
91
No
Finetuned Language Models Are Zero-Shot Learners
2021-09-03
Code
14
ST-MoE-L 4.1B (fine-tuned)
91
No
ST-MoE: Designing Stable and Transferable Sparse...
2022-02-17
Code
15
GPT-3 175B (0-shot)
91
No
Language Models are Few-Shot Learners
2020-05-28
Code
16
T0-3B (CoT fine-tuned)
90.9
No
The CoT Collection: Improving Zero-shot and Few-...
2023-05-23
Code
17
RoBERTa-Winogrande-ft 355M (fine-tuned)
90.6
No
WinoGrande: An Adversarial Winograd Schema Chall...
2019-07-24
Code
18
PaLM 2-M (1-shot)
90
No
PaLM 2 Technical Report
2023-05-17
Code
19
Flipped-3B
89.88
No
Guess the Instruction! Flipped Learning Makes La...
2022-10-06
Code
20
PaLM 2-S (1-shot)
89
No
PaLM 2 Technical Report
2023-05-17
Code
21
GPT-NeoX (one-shot)
88
No
BloombergGPT: A Large Language Model for Finance
2023-03-30
Code
22
FLAN 137B (few-shot, k=16)
87
No
Finetuned Language Models Are Zero-Shot Learners
2021-09-03
Code
23
GPT-3 175B (1-shot)
87
No
Language Models are Few-Shot Learners
2020-05-28
Code
24
RoBERTa-ft 355M (fine-tuned)
86.4
No
WinoGrande: An Adversarial Winograd Schema Chall...
2019-07-24
Code
25
Bloomberg GPT (one-shot)
86
No
BloombergGPT: A Large Language Model for Finance
2023-03-30
Code
26
OPT 66B (one-shot)
86
No
BloombergGPT: A Large Language Model for Finance
2023-03-30
Code
27
GPT-3 13B (few-shot, k=32)
86
No
Language Models are Few-Shot Learners
2020-05-28
Code
28
KiC-770M
85.3
No
Knowledge-in-Context: Towards Knowledgeable Semi...
2022-10-28
-
29
UL2 20B (0-shot)
85
No
UL2: Unifying Language Learning Paradigms
2022-05-10
Code
30
RoBERTa-Winogrande 355M (fine-tuned)
84.4
No
WinoGrande: An Adversarial Winograd Schema Chall...
2019-07-24
Code
31
Neo-6B (QA + WS)
84
No
Ask Me Anything: A simple strategy for prompting...
2022-10-05
Code
32
BLOOM 176B (one-shot)
84
No
BloombergGPT: A Large Language Model for Finance
2023-03-30
Code
33
T5-Large 770M (fine-tuned)
83.4
No
Exploring the Limits of Transfer Learning with a...
2019-10-23
Code
34
BERT-SocialIQA 340M
83.4
No
SocialIQA: Commonsense Reasoning about Social In...
2019-04-22
Code
35
Hybrid H3 2.7B (0-shot, logit scoring)
81
No
Hungry Hungry Hippos: Towards Language Modeling ...
2022-12-28
Code
36
BERT-large 340M
80.8
No
SocialIQA: Commonsense Reasoning about Social In...
2019-04-22
Code
37
RoE-3B
79.25
No
Exploring the Benefits of Training Expert Langua...
2023-02-07
Code
38
sMLP – deterministic 9.4B (0-shot)
79
No
Efficient Language Modeling with Sparse all-MLP
2022-03-14
-
39
KELM (finetuning BERT-large based single model)
78
No
KELM: Knowledge Enhanced Pre-Trained Language Re...
2021-09-09
Code
40
AlexaTM 20B
78
No
AlexaTM 20B: Few-Shot Learning Using a Large-Sca...
2022-08-02
Code
41
Neo-6B (few-shot)
77
No
Ask Me Anything: A simple strategy for prompting...
2022-10-05
Code
42
Hybrid H3 2.7B (3-shot, logit scoring)
77
No
Hungry Hungry Hippos: Towards Language Modeling ...
2022-12-28
Code
43
Causal Strength w/multi-word predicates (presumably on WinoGrande?)
76.4
No
WinoGrande: An Adversarial Winograd Schema Chall...
2019-07-24
Code
44
Gshard 9B
76
No
Efficient Language Modeling with Sparse all-MLP
2022-03-14
-
45
Switch Transformer 9B
75
No
Efficient Language Modeling with Sparse all-MLP
2022-03-14
-
46
GPT-3 Large 760M (0-shot)
73
No
Language Models are Few-Shot Learners
2020-05-28
Code
47
Causal Strength Computation w/multi-word predicates (on ClueWeb12)
71.2
No
-
-
-
48
T5-Base 220M (fine-tuned)
71.2
No
Exploring the Limits of Transfer Learning with a...
2019-10-23
Code
49
Causal Strength Computation (on Causal Net)
70.2
No
-
-
-
50
Causal Strength Computation (on ClueWeb12)
69.9
No
-
-
-
51
Hybrid H3 125M (0-shot, logit scoring)
67
No
Hungry Hungry Hippos: Towards Language Modeling ...
2022-12-28
Code
52
Hybrid H3 125M (0-shot, rank classification)
67
No
Hungry Hungry Hippos: Towards Language Modeling ...
2022-12-28
Code
53
Pointwise Mutual Information (on 10M stories)
65.4
No
WinoGrande: An Adversarial Winograd Schema Chall...
2019-07-24
Code
54
HASH Layers 10B (0-shot)
64
No
Efficient Language Modeling with Sparse all-MLP
2022-03-14
-
55
Base Layers 10B (0-shot)
63
No
Efficient Language Modeling with Sparse all-MLP
2022-03-14
-
56
N-Grammer 343M
60
No
N-Grammer: Augmenting Transformers with latent n...
2022-07-13
Code
57
Pointwise Mutual Information (on Project Gutenberg)
58.8
No
-
-
-
58
Neo-6B (QA)
58.2
No
Ask Me Anything: A simple strategy for prompting...
2022-10-05
Code
59
H3 125M (0-shot, rank classification)
51
No
Hungry Hungry Hippos: Towards Language Modeling ...
2022-12-28
Code
60
Random chance baseline
50
No
Back to Square One: Artifact Detection, Training...
2021-04-16
-
#1
PaLM 540B (finetuned)
SOTA
100
Accuracy
· 2022-04-05
PaLM: Scaling Language Modeling with Pathways
Code
#2
Vega v2 6B (KD-based prompt transfer)
99.4
Accuracy
· 2022-12-04
Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE
#3
ST-MoE-32B 269B (fine-tuned)
SOTA
99.2
Accuracy
· 2022-02-17
ST-MoE: Designing Stable and Transferable Sparse Expert Models
Code
#4
UL2 20B (fine-tuned)
99
Accuracy
· 2022-05-10
UL2: Unifying Language Learning Paradigms
Code
#5
DeBERTa-Ensemble
SOTA
98.4
Accuracy
· 2020-06-05
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
Code
#6
Turing NLR v5 XXL 5.4B (fine-tuned)
98.2
Accuracy
· 2022-12-04
Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE
#7
DeBERTa-1.5B
96.8
Accuracy
· 2020-06-05
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
Code
#8
PaLM 2-L (1-shot)
96
Accuracy
· 2023-05-17
PaLM 2 Technical Report
Code
#9
T5-XXL 11B (fine-tuned)
SOTA
94.8
Accuracy
· 2019-10-23
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Code
#10
FLAN 137B (prompt-tuned)
94
Accuracy
· 2021-09-03
Finetuned Language Models Are Zero-Shot Learners
Code
#11
GPT-3 175B (few-shot, k=32)
92
Accuracy
· 2020-05-28
Language Models are Few-Shot Learners
Code
#12
T5-XL 3B (fine-tuned)
92
Accuracy
· 2019-10-23
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Code
#13
FLAN 137B (zero-shot)
91
Accuracy
· 2021-09-03
Finetuned Language Models Are Zero-Shot Learners
Code
#14
ST-MoE-L 4.1B (fine-tuned)
91
Accuracy
· 2022-02-17
ST-MoE: Designing Stable and Transferable Sparse Expert Models
Code
#15
GPT-3 175B (0-shot)
91
Accuracy
· 2020-05-28
Language Models are Few-Shot Learners
Code
#16
T0-3B (CoT fine-tuned)
90.9
Accuracy
· 2023-05-23
The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning
Code
#17
RoBERTa-Winogrande-ft 355M (fine-tuned)
SOTA
90.6
Accuracy
· 2019-07-24
WinoGrande: An Adversarial Winograd Schema Challenge at Scale
Code
#18
PaLM 2-M (1-shot)
90
Accuracy
· 2023-05-17
PaLM 2 Technical Report
Code
#19
Flipped-3B
89.88
Accuracy
· 2022-10-06
Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners
Code
#20
PaLM 2-S (1-shot)
89
Accuracy
· 2023-05-17
PaLM 2 Technical Report
Code
#21
GPT-NeoX (one-shot)
88
Accuracy
· 2023-03-30
BloombergGPT: A Large Language Model for Finance
Code
#22
FLAN 137B (few-shot, k=16)
87
Accuracy
· 2021-09-03
Finetuned Language Models Are Zero-Shot Learners
Code
#23
GPT-3 175B (1-shot)
87
Accuracy
· 2020-05-28
Language Models are Few-Shot Learners
Code
#24
RoBERTa-ft 355M (fine-tuned)
86.4
Accuracy
· 2019-07-24
WinoGrande: An Adversarial Winograd Schema Challenge at Scale
Code
#25
Bloomberg GPT (one-shot)
86
Accuracy
· 2023-03-30
BloombergGPT: A Large Language Model for Finance
Code
#26
OPT 66B (one-shot)
86
Accuracy
· 2023-03-30
BloombergGPT: A Large Language Model for Finance
Code
#27
GPT-3 13B (few-shot, k=32)
86
Accuracy
· 2020-05-28
Language Models are Few-Shot Learners
Code
#28
KiC-770M
85.3
Accuracy
· 2022-10-28
Knowledge-in-Context: Towards Knowledgeable Semi-Parametric Language Models
#29
UL2 20B (0-shot)
85
Accuracy
· 2022-05-10
UL2: Unifying Language Learning Paradigms
Code
#30
RoBERTa-Winogrande 355M (fine-tuned)
84.4
Accuracy
· 2019-07-24
WinoGrande: An Adversarial Winograd Schema Challenge at Scale
Code
#31
Neo-6B (QA + WS)
84
Accuracy
· 2022-10-05
Ask Me Anything: A simple strategy for prompting language models
Code
#32
BLOOM 176B (one-shot)
84
Accuracy
· 2023-03-30
BloombergGPT: A Large Language Model for Finance
Code
#33
T5-Large 770M (fine-tuned)
83.4
Accuracy
· 2019-10-23
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Code
#34
BERT-SocialIQA 340M
SOTA
83.4
Accuracy
· 2019-04-22
SocialIQA: Commonsense Reasoning about Social Interactions
Code
#35
Hybrid H3 2.7B (0-shot, logit scoring)
81
Accuracy
· 2022-12-28
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
Code
#36
BERT-large 340M
80.8
Accuracy
· 2019-04-22
SocialIQA: Commonsense Reasoning about Social Interactions
Code
#37
RoE-3B
79.25
Accuracy
· 2023-02-07
Exploring the Benefits of Training Expert Language Models over Instruction Tuning
Code
#38
sMLP – deterministic 9.4B (0-shot)
79
Accuracy
· 2022-03-14
Efficient Language Modeling with Sparse all-MLP
#39
KELM (finetuning BERT-large based single model)
78
Accuracy
· 2021-09-09
KELM: Knowledge Enhanced Pre-Trained Language Representations with Message Passing on Hierarchical Relational Graphs
Code
#40
AlexaTM 20B
78
Accuracy
· 2022-08-02
AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model
Code
#41
Neo-6B (few-shot)
77
Accuracy
· 2022-10-05
Ask Me Anything: A simple strategy for prompting language models
Code
#42
Hybrid H3 2.7B (3-shot, logit scoring)
77
Accuracy
· 2022-12-28
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
Code
#43
Causal Strength w/multi-word predicates (presumably on WinoGrande?)
76.4
Accuracy
· 2019-07-24
WinoGrande: An Adversarial Winograd Schema Challenge at Scale
Code
#44
Gshard 9B
76
Accuracy
· 2022-03-14
Efficient Language Modeling with Sparse all-MLP
#45
Switch Transformer 9B
75
Accuracy
· 2022-03-14
Efficient Language Modeling with Sparse all-MLP
#46
GPT-3 Large 760M (0-shot)
73
Accuracy
· 2020-05-28
Language Models are Few-Shot Learners
Code
#47
Causal Strength Computation w/multi-word predicates (on ClueWeb12)
71.2
Accuracy
No paper
#48
T5-Base 220M (fine-tuned)
71.2
Accuracy
· 2019-10-23
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Code
#49
Causal Strength Computation (on Causal Net)
70.2
Accuracy
No paper
#50
Causal Strength Computation (on ClueWeb12)
69.9
Accuracy
No paper
#51
Hybrid H3 125M (0-shot, logit scoring)
67
Accuracy
· 2022-12-28
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
Code
#52
Hybrid H3 125M (0-shot, rank classification)
67
Accuracy
· 2022-12-28
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
Code
#53
Pointwise Mutual Information (on 10M stories)
65.4
Accuracy
· 2019-07-24
WinoGrande: An Adversarial Winograd Schema Challenge at Scale
Code
#54
HASH Layers 10B (0-shot)
64
Accuracy
· 2022-03-14
Efficient Language Modeling with Sparse all-MLP
#55
Base Layers 10B (0-shot)
63
Accuracy
· 2022-03-14
Efficient Language Modeling with Sparse all-MLP
#56
N-Grammer 343M
60
Accuracy
· 2022-07-13
N-Grammer: Augmenting Transformers with latent n-grams
Code
#57
Pointwise Mutual Information (on Project Gutenberg)
58.8
Accuracy
No paper
#58
Neo-6B (QA)
58.2
Accuracy
· 2022-10-05
Ask Me Anything: A simple strategy for prompting language models
Code
#59
H3 125M (0-shot, rank classification)
51
Accuracy
· 2022-12-28
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
Code
#60
Random chance baseline
50
Accuracy
· 2021-04-16
Back to Square One: Artifact Detection, Training and Commonsense Disentanglement in the Winograd Schema