Question Answering on COPA

Metric: Accuracy (higher is better)

LeaderboardDataset

Loading chart...

Results

#	Model↕	Accuracy▼	Extra Data	Paper	Date↕	Code
1	PaLM 540B (finetuned)	100	No	PaLM: Scaling Language Modeling with Pathways	2022-04-05	Code
2	Vega v2 6B (KD-based prompt transfer)	99.4	No	Toward Efficient Language Model Pretraining and ...	2022-12-04	-
3	ST-MoE-32B 269B (fine-tuned)	99.2	No	ST-MoE: Designing Stable and Transferable Sparse...	2022-02-17	Code
4	UL2 20B (fine-tuned)	99	No	UL2: Unifying Language Learning Paradigms	2022-05-10	Code
5	DeBERTa-Ensemble	98.4	No	DeBERTa: Decoding-enhanced BERT with Disentangle...	2020-06-05	Code
6	Turing NLR v5 XXL 5.4B (fine-tuned)	98.2	No	Toward Efficient Language Model Pretraining and ...	2022-12-04	-
7	DeBERTa-1.5B	96.8	No	DeBERTa: Decoding-enhanced BERT with Disentangle...	2020-06-05	Code
8	PaLM 2-L (1-shot)	96	No	PaLM 2 Technical Report	2023-05-17	Code
9	T5-XXL 11B (fine-tuned)	94.8	No	Exploring the Limits of Transfer Learning with a...	2019-10-23	Code
10	FLAN 137B (prompt-tuned)	94	No	Finetuned Language Models Are Zero-Shot Learners	2021-09-03	Code
11	GPT-3 175B (few-shot, k=32)	92	No	Language Models are Few-Shot Learners	2020-05-28	Code
12	T5-XL 3B (fine-tuned)	92	No	Exploring the Limits of Transfer Learning with a...	2019-10-23	Code
13	FLAN 137B (zero-shot)	91	No	Finetuned Language Models Are Zero-Shot Learners	2021-09-03	Code
14	ST-MoE-L 4.1B (fine-tuned)	91	No	ST-MoE: Designing Stable and Transferable Sparse...	2022-02-17	Code
15	GPT-3 175B (0-shot)	91	No	Language Models are Few-Shot Learners	2020-05-28	Code
16	T0-3B (CoT fine-tuned)	90.9	No	The CoT Collection: Improving Zero-shot and Few-...	2023-05-23	Code
17	RoBERTa-Winogrande-ft 355M (fine-tuned)	90.6	No	WinoGrande: An Adversarial Winograd Schema Chall...	2019-07-24	Code
18	PaLM 2-M (1-shot)	90	No	PaLM 2 Technical Report	2023-05-17	Code
19	Flipped-3B	89.88	No	Guess the Instruction! Flipped Learning Makes La...	2022-10-06	Code
20	PaLM 2-S (1-shot)	89	No	PaLM 2 Technical Report	2023-05-17	Code
21	GPT-NeoX (one-shot)	88	No	BloombergGPT: A Large Language Model for Finance	2023-03-30	Code
22	FLAN 137B (few-shot, k=16)	87	No	Finetuned Language Models Are Zero-Shot Learners	2021-09-03	Code
23	GPT-3 175B (1-shot)	87	No	Language Models are Few-Shot Learners	2020-05-28	Code
24	RoBERTa-ft 355M (fine-tuned)	86.4	No	WinoGrande: An Adversarial Winograd Schema Chall...	2019-07-24	Code
25	Bloomberg GPT (one-shot)	86	No	BloombergGPT: A Large Language Model for Finance	2023-03-30	Code
26	OPT 66B (one-shot)	86	No	BloombergGPT: A Large Language Model for Finance	2023-03-30	Code
27	GPT-3 13B (few-shot, k=32)	86	No	Language Models are Few-Shot Learners	2020-05-28	Code
28	KiC-770M	85.3	No	Knowledge-in-Context: Towards Knowledgeable Semi...	2022-10-28	-
29	UL2 20B (0-shot)	85	No	UL2: Unifying Language Learning Paradigms	2022-05-10	Code
30	RoBERTa-Winogrande 355M (fine-tuned)	84.4	No	WinoGrande: An Adversarial Winograd Schema Chall...	2019-07-24	Code
31	Neo-6B (QA + WS)	84	No	Ask Me Anything: A simple strategy for prompting...	2022-10-05	Code
32	BLOOM 176B (one-shot)	84	No	BloombergGPT: A Large Language Model for Finance	2023-03-30	Code
33	T5-Large 770M (fine-tuned)	83.4	No	Exploring the Limits of Transfer Learning with a...	2019-10-23	Code
34	BERT-SocialIQA 340M	83.4	No	SocialIQA: Commonsense Reasoning about Social In...	2019-04-22	Code
35	Hybrid H3 2.7B (0-shot, logit scoring)	81	No	Hungry Hungry Hippos: Towards Language Modeling ...	2022-12-28	Code
36	BERT-large 340M	80.8	No	SocialIQA: Commonsense Reasoning about Social In...	2019-04-22	Code
37	RoE-3B	79.25	No	Exploring the Benefits of Training Expert Langua...	2023-02-07	Code
38	sMLP – deterministic 9.4B (0-shot)	79	No	Efficient Language Modeling with Sparse all-MLP	2022-03-14	-
39	KELM (finetuning BERT-large based single model)	78	No	KELM: Knowledge Enhanced Pre-Trained Language Re...	2021-09-09	Code
40	AlexaTM 20B	78	No	AlexaTM 20B: Few-Shot Learning Using a Large-Sca...	2022-08-02	Code
41	Neo-6B (few-shot)	77	No	Ask Me Anything: A simple strategy for prompting...	2022-10-05	Code
42	Hybrid H3 2.7B (3-shot, logit scoring)	77	No	Hungry Hungry Hippos: Towards Language Modeling ...	2022-12-28	Code
43	Causal Strength w/multi-word predicates (presumably on WinoGrande?)	76.4	No	WinoGrande: An Adversarial Winograd Schema Chall...	2019-07-24	Code
44	Gshard 9B	76	No	Efficient Language Modeling with Sparse all-MLP	2022-03-14	-
45	Switch Transformer 9B	75	No	Efficient Language Modeling with Sparse all-MLP	2022-03-14	-
46	GPT-3 Large 760M (0-shot)	73	No	Language Models are Few-Shot Learners	2020-05-28	Code
47	Causal Strength Computation w/multi-word predicates (on ClueWeb12)	71.2	No	-	-	-
48	T5-Base 220M (fine-tuned)	71.2	No	Exploring the Limits of Transfer Learning with a...	2019-10-23	Code
49	Causal Strength Computation (on Causal Net)	70.2	No	-	-	-
50	Causal Strength Computation (on ClueWeb12)	69.9	No	-	-	-
51	Hybrid H3 125M (0-shot, logit scoring)	67	No	Hungry Hungry Hippos: Towards Language Modeling ...	2022-12-28	Code
52	Hybrid H3 125M (0-shot, rank classification)	67	No	Hungry Hungry Hippos: Towards Language Modeling ...	2022-12-28	Code
53	Pointwise Mutual Information (on 10M stories)	65.4	No	WinoGrande: An Adversarial Winograd Schema Chall...	2019-07-24	Code
54	HASH Layers 10B (0-shot)	64	No	Efficient Language Modeling with Sparse all-MLP	2022-03-14	-
55	Base Layers 10B (0-shot)	63	No	Efficient Language Modeling with Sparse all-MLP	2022-03-14	-
56	N-Grammer 343M	60	No	N-Grammer: Augmenting Transformers with latent n...	2022-07-13	Code
57	Pointwise Mutual Information (on Project Gutenberg)	58.8	No	-	-	-
58	Neo-6B (QA)	58.2	No	Ask Me Anything: A simple strategy for prompting...	2022-10-05	Code
59	H3 125M (0-shot, rank classification)	51	No	Hungry Hungry Hippos: Towards Language Modeling ...	2022-12-28	Code
60	Random chance baseline	50	No	Back to Square One: Artifact Detection, Training...	2021-04-16	-