TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Natural Language Processing/Question Answering/COPA

Question Answering on COPA

Metric: Accuracy (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Accuracy▼Extra DataPaperDate↕Code
1PaLM 540B (finetuned) 100NoPaLM: Scaling Language Modeling with Pathways2022-04-05Code
2Vega v2 6B (KD-based prompt transfer)99.4NoToward Efficient Language Model Pretraining and ...2022-12-04-
3ST-MoE-32B 269B (fine-tuned)99.2NoST-MoE: Designing Stable and Transferable Sparse...2022-02-17Code
4UL2 20B (fine-tuned)99NoUL2: Unifying Language Learning Paradigms2022-05-10Code
5DeBERTa-Ensemble98.4NoDeBERTa: Decoding-enhanced BERT with Disentangle...2020-06-05Code
6Turing NLR v5 XXL 5.4B (fine-tuned)98.2NoToward Efficient Language Model Pretraining and ...2022-12-04-
7DeBERTa-1.5B96.8NoDeBERTa: Decoding-enhanced BERT with Disentangle...2020-06-05Code
8PaLM 2-L (1-shot)96NoPaLM 2 Technical Report2023-05-17Code
9T5-XXL 11B (fine-tuned)94.8NoExploring the Limits of Transfer Learning with a...2019-10-23Code
10FLAN 137B (prompt-tuned)94NoFinetuned Language Models Are Zero-Shot Learners2021-09-03Code
11GPT-3 175B (few-shot, k=32)92NoLanguage Models are Few-Shot Learners2020-05-28Code
12T5-XL 3B (fine-tuned)92NoExploring the Limits of Transfer Learning with a...2019-10-23Code
13FLAN 137B (zero-shot)91NoFinetuned Language Models Are Zero-Shot Learners2021-09-03Code
14ST-MoE-L 4.1B (fine-tuned)91NoST-MoE: Designing Stable and Transferable Sparse...2022-02-17Code
15GPT-3 175B (0-shot)91NoLanguage Models are Few-Shot Learners2020-05-28Code
16T0-3B (CoT fine-tuned)90.9NoThe CoT Collection: Improving Zero-shot and Few-...2023-05-23Code
17RoBERTa-Winogrande-ft 355M (fine-tuned)90.6NoWinoGrande: An Adversarial Winograd Schema Chall...2019-07-24Code
18PaLM 2-M (1-shot)90NoPaLM 2 Technical Report2023-05-17Code
19Flipped-3B89.88NoGuess the Instruction! Flipped Learning Makes La...2022-10-06Code
20PaLM 2-S (1-shot)89NoPaLM 2 Technical Report2023-05-17Code
21GPT-NeoX (one-shot)88NoBloombergGPT: A Large Language Model for Finance2023-03-30Code
22FLAN 137B (few-shot, k=16)87NoFinetuned Language Models Are Zero-Shot Learners2021-09-03Code
23GPT-3 175B (1-shot)87NoLanguage Models are Few-Shot Learners2020-05-28Code
24RoBERTa-ft 355M (fine-tuned)86.4NoWinoGrande: An Adversarial Winograd Schema Chall...2019-07-24Code
25Bloomberg GPT (one-shot)86NoBloombergGPT: A Large Language Model for Finance2023-03-30Code
26OPT 66B (one-shot)86NoBloombergGPT: A Large Language Model for Finance2023-03-30Code
27GPT-3 13B (few-shot, k=32)86NoLanguage Models are Few-Shot Learners2020-05-28Code
28KiC-770M85.3NoKnowledge-in-Context: Towards Knowledgeable Semi...2022-10-28-
29UL2 20B (0-shot)85NoUL2: Unifying Language Learning Paradigms2022-05-10Code
30RoBERTa-Winogrande 355M (fine-tuned)84.4NoWinoGrande: An Adversarial Winograd Schema Chall...2019-07-24Code
31Neo-6B (QA + WS)84NoAsk Me Anything: A simple strategy for prompting...2022-10-05Code
32BLOOM 176B (one-shot)84NoBloombergGPT: A Large Language Model for Finance2023-03-30Code
33T5-Large 770M (fine-tuned)83.4NoExploring the Limits of Transfer Learning with a...2019-10-23Code
34BERT-SocialIQA 340M83.4NoSocialIQA: Commonsense Reasoning about Social In...2019-04-22Code
35Hybrid H3 2.7B (0-shot, logit scoring)81NoHungry Hungry Hippos: Towards Language Modeling ...2022-12-28Code
36BERT-large 340M80.8NoSocialIQA: Commonsense Reasoning about Social In...2019-04-22Code
37RoE-3B79.25NoExploring the Benefits of Training Expert Langua...2023-02-07Code
38sMLP – deterministic 9.4B (0-shot)79NoEfficient Language Modeling with Sparse all-MLP2022-03-14-
39KELM (finetuning BERT-large based single model)78NoKELM: Knowledge Enhanced Pre-Trained Language Re...2021-09-09Code
40AlexaTM 20B78NoAlexaTM 20B: Few-Shot Learning Using a Large-Sca...2022-08-02Code
41Neo-6B (few-shot)77NoAsk Me Anything: A simple strategy for prompting...2022-10-05Code
42Hybrid H3 2.7B (3-shot, logit scoring)77NoHungry Hungry Hippos: Towards Language Modeling ...2022-12-28Code
43Causal Strength w/multi-word predicates (presumably on WinoGrande?)76.4NoWinoGrande: An Adversarial Winograd Schema Chall...2019-07-24Code
44Gshard 9B76NoEfficient Language Modeling with Sparse all-MLP2022-03-14-
45Switch Transformer 9B75NoEfficient Language Modeling with Sparse all-MLP2022-03-14-
46GPT-3 Large 760M (0-shot)73NoLanguage Models are Few-Shot Learners2020-05-28Code
47Causal Strength Computation w/multi-word predicates (on ClueWeb12)71.2No---
48T5-Base 220M (fine-tuned)71.2NoExploring the Limits of Transfer Learning with a...2019-10-23Code
49Causal Strength Computation (on Causal Net)70.2No---
50Causal Strength Computation (on ClueWeb12)69.9No---
51Hybrid H3 125M (0-shot, logit scoring)67NoHungry Hungry Hippos: Towards Language Modeling ...2022-12-28Code
52Hybrid H3 125M (0-shot, rank classification)67NoHungry Hungry Hippos: Towards Language Modeling ...2022-12-28Code
53Pointwise Mutual Information (on 10M stories)65.4NoWinoGrande: An Adversarial Winograd Schema Chall...2019-07-24Code
54HASH Layers 10B (0-shot)64NoEfficient Language Modeling with Sparse all-MLP2022-03-14-
55Base Layers 10B (0-shot)63NoEfficient Language Modeling with Sparse all-MLP2022-03-14-
56N-Grammer 343M60NoN-Grammer: Augmenting Transformers with latent n...2022-07-13Code
57Pointwise Mutual Information (on Project Gutenberg)58.8No---
58Neo-6B (QA)58.2NoAsk Me Anything: A simple strategy for prompting...2022-10-05Code
59H3 125M (0-shot, rank classification)51NoHungry Hungry Hippos: Towards Language Modeling ...2022-12-28Code
60Random chance baseline50NoBack to Square One: Artifact Detection, Training...2021-04-16-