TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Natural Language Processing/Common Sense Reasoning/CommonsenseQA

Common Sense Reasoning on CommonsenseQA

Metric: Accuracy (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Accuracy▼Extra DataPaperDate↕Code
1GPT-4o (HPT)92.54NoHierarchical Prompting Taxonomy: A Universal Eva...2024-06-18Code
2DeBERTaV3-large+KEAR91.2YesHuman Parity on CommonsenseQA: Augmenting Self-A...2021-12-06Code
3PaLM 2 (few‑shot, CoT, SC)90.4YesPaLM 2 Technical Report2023-05-17Code
4KEAR89.4YesHuman Parity on CommonsenseQA: Augmenting Self-A...2021-12-06Code
5DEKCOR83.3YesFusing Context Into Knowledge Graph for Commonse...2020-12-09Code
6Unicorn 11B (fine-tuned)79.3NoUNICORN on RAINBOW: A Universal Commonsense Reas...2021-03-24Code
7MUPPET Roberta Large79.2YesMuppet: Massive Multi-task Representations with ...2021-01-26Code
8UnifiedQA 11B (fine-tuned)79.1YesUnifiedQA: Crossing Format Boundaries With a Sin...2020-05-02Code
9DRAGON78.2NoDeep Bidirectional Language-Knowledge Graph Pret...2022-10-17Code
10T5-XXL 11B (fine-tuned)78.1NoUnifiedQA: Crossing Format Boundaries With a Sin...2020-05-02Code
11Albert Lan et al. (2020) (ensemble)76.5NoALBERT: A Lite BERT for Self-supervised Learning...2019-09-26Code
12UnifiedQA 11B (zero-shot)76.2NoUnifiedQA: Crossing Format Boundaries With a Sin...2020-05-02Code
13QA-GNN76.1NoQA-GNN: Reasoning with Language Models and Knowl...2021-04-13Code
14XLNet+GraphReason75.3NoGraph-Based Reasoning over Heterogeneous Externa...2019-09-09Code
15GrapeQA: PEGA73.5NoGrapeQA: GRaph Augmentation and Pruning to Enhan...2023-03-22-
16RoBERTa+HyKAS Ma et al. (2019)73.2NoTowards Generalizable Neuro-Symbolic Systems for...2019-10-30-
17GPT-3 Direct Finetuned73NoHuman Parity on CommonsenseQA: Augmenting Self-A...2021-12-06Code
18STaR (on GPT-J)72.3NoSTaR: Bootstrapping Reasoning With Reasoning2022-03-28Code
19RoBERTa-Large 355M72.1NoRoBERTa: A Robustly Optimized BERT Pretraining A...2019-07-26Code
20STaR without Rationalization (on GPT-J)68.8NoSTaR: Bootstrapping Reasoning With Reasoning2022-03-28Code
21OPT 66B (1-shot)66.4NoBloombergGPT: A Large Language Model for Finance2023-03-30Code
22Bloomberg GPT 50B (1-shot)65.5NoBloombergGPT: A Large Language Model for Finance2023-03-30Code
23CAGE-reasoning64.7NoExplain Yourself! Leveraging Language Models for...2019-06-06Code
24BLOOM 176B (1-shot)64.2NoBloombergGPT: A Large Language Model for Finance2023-03-30Code
25UnifiedQA 440M (fine-tuned)64NoUnifiedQA: Crossing Format Boundaries With a Sin...2020-05-02Code
26BART-large 440M (fine-tuned)62.5NoUnifiedQA: Crossing Format Boundaries With a Sin...2020-05-02Code
27BERT_CSlarge62.2NoAlign, Mask and Select: A Simple Method for Inco...2019-08-19-
28GPT-NeoX 20B (1-shot)60.4NoBloombergGPT: A Large Language Model for Finance2023-03-30Code
29GPT-J Direct Finetuned60NoSTaR: Bootstrapping Reasoning With Reasoning2022-03-28Code
30KagNet58.9YesKagNet: Knowledge-Aware Graph Networks for Commo...2019-09-04Code
31BERT-LARGE55.9YesCommonsenseQA: A Question Answering Challenge Ta...2018-11-02Code
32UL2 20B (chain-of-thought + self-consistency)55.7NoUL2: Unifying Language Learning Paradigms2022-05-10Code
33Few-shot CoT LaMDA 137B55.6NoSTaR: Bootstrapping Reasoning With Reasoning2022-03-28Code
34UL2 20B (chain-of-thought)51.4NoUL2: Unifying Language Learning Paradigms2022-05-10Code
35Few-shot CoT GPT-J36.6NoSTaR: Bootstrapping Reasoning With Reasoning2022-03-28Code
36UL2 20B (zero-shot)34.2NoUL2: Unifying Language Learning Paradigms2022-05-10Code
37Chain of thought ASDiv28.6NoChain-of-Thought Prompting Elicits Reasoning in ...2022-01-28Code
38Few-shot Direct GPT-J20.9NoSTaR: Bootstrapping Reasoning With Reasoning2022-03-28Code