TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Natural Language Processing/Question Answering/OpenBookQA

Question Answering on OpenBookQA

Metric: Accuracy (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Accuracy▼Extra DataPaperDate↕Code
1GPT-4 + knowledge base95.9No---
2MVP-Tuning (ensemble)95.2No---
3PaLM 540B (Self Improvement, Self Consistency)94.4NoLarge Language Models Can Self-Improve2022-10-20-
4X-Reasoner94.2No---
5PaLM 540B (Self Improvement, CoT Prompting)93NoLarge Language Models Can Self-Improve2022-10-20-
6PaLM 540B (Self Improvement, Standard-Prompting)92NoLarge Language Models Can Self-Improve2022-10-20-
7DeBERTa-xxlarge 1.5B + MVP-Tuning91.3No---
8PaLM 540B (Self Consistency)90NoLarge Language Models Can Self-Improve2022-10-20-
9GrapeQA: PEGA+CANP90NoGrapeQA: GRaph Augmentation and Pruning to Enhan...2023-03-22-
10GenMC 11B89.8NoClues Before Answers: Generation-Enhanced Multip...2022-04-30Code
11AristoRoBERTa + MVP-Tuning87.6No---
12AristoRoBERTa + Graph Soft Counter87.4NoGNN is a Counter? Revisiting GNN for Question An...2021-10-07-
13UnifiedQA 11B87.2NoUnifiedQA: Crossing Format Boundaries With a Sin...2020-05-02Code
14LLaMA-3 8B+MoSLoRA86.8NoMixture-of-Subspaces in Low-Rank Adaptation2024-06-16Code
15PaLM 540B (CoT Prompting)86.4NoLarge Language Models Can Self-Improve2022-10-20-
16LLaMA-3 8B + MixLoRA84.8NoMixLoRA: Enhancing Large Language Models Fine-Tu...2024-04-22Code
17PaLM 540B (Standard-Prompting)84.4NoLarge Language Models Can Self-Improve2022-10-20-
18TTTTT 3B83.2NoFusing Context Into Knowledge Graph for Commonse...2020-12-09Code
19LLaMA-2 13B + MixLoRA83NoMixLoRA: Enhancing Large Language Models Fine-Tu...2024-04-22Code
20AristoRoBERTa + QA-GNN82.8NoQA-GNN: Reasoning with Language Models and Knowl...2021-04-13Code
21QA-GNN82.8NoQA-GNN: Reasoning with Language Models and Knowl...2021-04-13Code
22DEKCOR82.4NoFusing Context Into Knowledge Graph for Commonse...2020-12-09Code
23GrapeQA: PEGA82NoGrapeQA: GRaph Augmentation and Pruning to Enhan...2023-03-22-
24LLaMA-2 7B + MixLoRA81.6NoMixLoRA: Enhancing Large Language Models Fine-Tu...2024-04-22Code
25AristoRoBERTa77.8NoQA-GNN: Reasoning with Language Models and Knowl...2021-04-13Code
26BiLSTM max-out question-match (science fact + common knowledge fact)76.9NoCan a Suit of Armor Conduct Electricity? A New D...2018-09-08Code
27Careful Selection72NoCareful Selection of Knowledge to solve Open Boo...2019-07-24-
28GrapeQA: CANP66.2NoGrapeQA: GRaph Augmentation and Pruning to Enhan...2023-03-22-
29GPT-3 175B (few-shot, k=32)65.4NoLanguage Models are Few-Shot Learners2020-05-28Code
30PaLM 2-L (1-shot)58.5NoPaLM 2 Technical Report2023-05-17Code
31OPT 66B (one-shot)58NoBloombergGPT: A Large Language Model for Finance2023-03-30Code
32PaLM 2-S (1-shot)57.4NoPaLM 2 Technical Report2023-05-17Code
33BiLSTM max-out question-match (WordNet + science fact)56.3NoCan a Suit of Armor Conduct Electricity? A New D...2018-09-08Code
34PaLM 2-M (1-shot)56.2NoPaLM 2 Technical Report2023-05-17Code
35BiLSTM max-out question-match (with a science fact)55.8NoCan a Suit of Armor Conduct Electricity? A New D...2018-09-08Code
36Bloomberg GPT 50B (1-shot)51.6NoBloombergGPT: A Large Language Model for Finance2023-03-30Code
37BLOOM 176B (2-shot)47.2NoBloombergGPT: A Large Language Model for Finance2023-03-30Code
38GPT-NeoX 50B (2-shot)44.2NoBloombergGPT: A Large Language Model for Finance2023-03-30Code
39LaMini-GPT 1.5B39.8NoLaMini-LM: A Diverse Herd of Distilled Models fr...2023-04-27Code
40LaMini-T5 738M36NoLaMini-LM: A Diverse Herd of Distilled Models fr...2023-04-27Code
41LaMini-F-T5 783M34NoLaMini-LM: A Diverse Herd of Distilled Models fr...2023-04-27Code
42T5-Large 738M32.8NoLaMini-LM: A Diverse Herd of Distilled Models fr...2023-04-27Code
43GPT-2-XL 1.5B32NoLaMini-LM: A Diverse Herd of Distilled Models fr...2023-04-27Code
44FLAN-T5-Large 783M31.2NoLaMini-LM: A Diverse Herd of Distilled Models fr...2023-04-27Code
45Random chance baseline25NoHellaSwag: Can a Machine Really Finish Your Sent...2019-05-19Code