TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Natural Language Processing/Question Answering/PIQA

Question Answering on PIQA

Metric: Accuracy (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Accuracy▼Extra DataPaperDate↕Code
1Unicorn 11B (fine-tuned)90.1NoUNICORN on RAINBOW: A Universal Commonsense Reas...2021-03-24Code
2LLaMA3 8B+MoSLoRA89.7NoMixture-of-Subspaces in Low-Rank Adaptation2024-06-16Code
3CompassMTL 567M with Tailor88.3NoTask Compass: Scaling Multi-task Pre-training wi...2022-10-12Code
4LLaMA-3 8B + MixLoRA87.6NoMixLoRA: Enhancing Large Language Models Fine-Tu...2024-04-22Code
5DeBERTa-Large 304M87.4NoTwo is Better than Many? Binary Classification a...2022-10-29Code
6CompassMTL 567M87.3NoTask Compass: Scaling Multi-task Pre-training wi...2022-10-12Code
7LLaMA-2 13B + MixLoRA86.8NoMixLoRA: Enhancing Large Language Models Fine-Tu...2024-04-22Code
8Shakti-LLM (2.5B)86.2NoSHAKTI: A 2.5 Billion Parameter Small Language M...2024-10-15-
9DeBERTa-Large 304M (classification-based)85.9NoTwo is Better than Many? Binary Classification a...2022-10-29Code
10ExDeBERTa 567M85.5NoTask Compass: Scaling Multi-task Pre-training wi...2022-10-12Code
11UnifiedQA 3B85.3NoUnifiedQA: Crossing Format Boundaries With a Sin...2020-05-02Code
12PaLM 2-L (1-shot)85NoPaLM 2 Technical Report2023-05-17Code
13Mixtral 8x7B (0-shot)83.6NoMixtral of Experts2024-01-08Code
14PaLM 2-M (1-shot)83.2NoPaLM 2 Technical Report2023-05-17Code
15LLaMA-2 7B + MixLoRA83.2NoMixLoRA: Enhancing Large Language Models Fine-Tu...2024-04-22Code
16Mistral 7B (0-shot)83NoMistral 7B2023-10-10Code
17LLaMA 65B (0-shot)82.8NoLLaMA: Open and Efficient Foundation Language Mo...2023-02-27Code
18LLaMA 2 70B (0-shot)82.8NoLlama 2: Open Foundation and Fine-Tuned Chat Mod...2023-07-18Code
19Camelidae-8×34B82.7NoParameter-Efficient Sparsity Crafting from Dense...2024-01-05Code
20LLaMA 33B (0-shot)82.3NoLLaMA: Open and Efficient Foundation Language Mo...2023-02-27Code
21PaLM 2-S (1-shot)82.2NoPaLM 2 Technical Report2023-05-17Code
22Mistral 7B (0-shot)82.2NoMixtral of Experts2024-01-08Code
23MT-NLG 530B (0-shot)82NoMegatron-LM: Training Multi-Billion Parameter La...2019-09-17Code
24LLaMA 2 34B (0-shot)81.9NoLlama 2: Open Foundation and Fine-Tuned Chat Mod...2023-07-18Code
25Gopher 280B (0-shot)81.8NoScaling Language Models: Methods, Analysis & Ins...2021-12-08Code
26Chinchilla 70B (0-shot)81.8NoTraining Compute-Optimal Large Language Models2022-03-29Code
27FLAN 137B (few-shot, k=10)81.7NoFinetuned Language Models Are Zero-Shot Learners2021-09-03Code
28OPT-175B81.07NoSparseGPT: Massive Language Models Can Be Accura...2023-01-02Code
29GPT-3 175B (0-shot)81NoLanguage Models are Few-Shot Learners2020-05-28Code
30SparseGPT 175B (50% Sparsity)80.63NoSparseGPT: Massive Language Models Can Be Accura...2023-01-02Code
31FLAN 137B (0-shot)80.5NoFinetuned Language Models Are Zero-Shot Learners2021-09-03Code
32LLaMA 2 13B (0-shot)80.5NoLlama 2: Open Foundation and Fine-Tuned Chat Mod...2023-07-18Code
33LLaMA 13B (0-shot)80.1NoLLaMA: Open and Efficient Foundation Language Mo...2023-02-27Code
34LLaMA 7B (0-shot)79.8NoLLaMA: Open and Efficient Foundation Language Mo...2023-02-27Code
35SparseGPT 175B (4:8 Sparsity)79.54NoSparseGPT: Massive Language Models Can Be Accura...2023-01-02Code
36SparseGPT 175B (2:4 Sparsity)79.54NoSparseGPT: Massive Language Models Can Be Accura...2023-01-02Code
37RoBERTa-Large 355M79.4NoRoBERTa: A Robustly Optimized BERT Pretraining A...2019-07-26Code
38LLaMA 2 7B (0-shot)78.8NoLlama 2: Open Foundation and Fine-Tuned Chat Mod...2023-07-18Code
39Bloomberg GPT 50B (1-shot)77.9NoBloombergGPT: A Large Language Model for Finance2023-03-30Code
40OPT 66B (1-shot)77.6NoBloombergGPT: A Large Language Model for Finance2023-03-30Code
41RoBERTa-large 355M (fine-tuned)77.1NoPIQA: Reasoning about Physical Commonsense in Na...2019-11-26Code
42phi-1.5-web (1.3B)77NoTextbooks Are All You Need II: phi-1.5 technical...2023-09-11Code
43BLOOM 176B (1-shot)77NoBloombergGPT: A Large Language Model for Finance2023-03-30Code
44Pythia 12B (5-shot)76.7NoPythia: A Suite for Analyzing Large Language Mod...2023-04-03Code
45Open-LLaMA-3B-v276.2NoSheared LLaMA: Accelerating Language Model Pre-t...2023-10-10Code
46Pythia 12B (0-shot)76NoPythia: A Suite for Analyzing Large Language Mod...2023-04-03Code
47Sheared-LLaMA-2.7B75.8NoSheared LLaMA: Accelerating Language Model Pre-t...2023-10-10Code
48GPT-NeoX 20B (1-shot)75.8NoBloombergGPT: A Large Language Model for Finance2023-03-30Code
49Pythia 6.9B (0-shot)75.2NoPythia: A Suite for Analyzing Large Language Mod...2023-04-03Code
50Sheared-LLaMA-1.3B73.4NoSheared LLaMA: Accelerating Language Model Pre-t...2023-10-10Code
51sMLP - deterministic 9.4B (0-shot)73NoEfficient Language Modeling with Sparse all-MLP2022-03-14-
52GPT-3 Large 760M (0-shot)72.9NoLanguage Models are Few-Shot Learners2020-05-28Code
53FLAN-T5-Large 783M72.2NoLaMini-LM: A Diverse Herd of Distilled Models fr...2023-04-27Code
54LaMini-GPT 1.5B71.3NoLaMini-LM: A Diverse Herd of Distilled Models fr...2023-04-27Code
55LaMini-F-T5 783M70.6NoLaMini-LM: A Diverse Herd of Distilled Models fr...2023-04-27Code
56GPT-2-XL 1.5B70.5NoLaMini-LM: A Diverse Herd of Distilled Models fr...2023-04-27Code
57Pythia 1B (5-shot)70.4NoPythia: A Suite for Analyzing Large Language Mod...2023-04-03Code
58GPT-2-small 124M (fine-tuned)69.2NoPIQA: Reasoning about Physical Commonsense in Na...2019-11-26Code
59Gshard 9B68.1NoEfficient Language Modeling with Sparse all-MLP2022-03-14-
60LaMini-T5 738M67.2NoLaMini-LM: A Diverse Herd of Distilled Models fr...2023-04-27Code
61BERT-large 340M (fine-tuned)66.8NoPIQA: Reasoning about Physical Commonsense in Na...2019-11-26Code
62BERT-Large 340M66.7NoBERT: Pre-training of Deep Bidirectional Transfo...2018-10-11Code
63Base Layers 10B (0-shot)63.8NoEfficient Language Modeling with Sparse all-MLP2022-03-14-
64HASH Layers 10B (0-shot)63.8NoEfficient Language Modeling with Sparse all-MLP2022-03-14-
65T5-Large 738M55.9NoLaMini-LM: A Diverse Herd of Distilled Models fr...2023-04-27Code
66OPT-175B (50% Sparsity)54.73NoSparseGPT: Massive Language Models Can Be Accura...2023-01-02Code
67Random chance baseline50NoBack to Square One: Artifact Detection, Training...2021-04-16Code