TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Natural Language Processing/Question Answering/BoolQ

Question Answering on BoolQ

Metric: Accuracy (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Accuracy▼Extra DataPaperDate↕Code
1Mistral-Nemo 12B (HPT)99.87NoHierarchical Prompting Taxonomy: A Universal Eva...2024-06-18Code
2ST-MoE-32B 269B (fine-tuned)92.4NoST-MoE: Designing Stable and Transferable Sparse...2022-02-17Code
3PaLM 540B (fine-tuned)92.2NoPaLM: Scaling Language Modeling with Pathways2022-04-05Code
4Turing NLR v5 XXL 5.4B (fine-tuned)92NoToward Efficient Language Model Pretraining and ...2022-12-04-
5T5-XXL 11B (fine-tuned)91.2NoExploring the Limits of Transfer Learning with a...2019-10-23Code
6PaLM 2-L (1-shot)90.9NoPaLM 2 Technical Report2023-05-17Code
7UL2 20B (fine-tuned)90.8NoUL2: Unifying Language Learning Paradigms2022-05-10Code
8Vega v2 6B (fine-tuned)90.5NoToward Efficient Language Model Pretraining and ...2022-12-04-
9DeBERTa-1.5B90.4NoDeBERTa: Decoding-enhanced BERT with Disentangle...2020-06-05Code
10PaLM 2-M (1-shot)88.6NoPaLM 2 Technical Report2023-05-17Code
11ST-MoE-L 4.1B (fine-tuned)88.6NoST-MoE: Designing Stable and Transferable Sparse...2022-02-17Code
12PaLM 2-S (1-shot)88.1NoPaLM 2 Technical Report2023-05-17Code
13MUPPET Roberta Large87.5NoMuppet: Massive Multi-task Representations with ...2021-01-26Code
14FLAN 137B (prompt-tuned)86.3NoFinetuned Language Models Are Zero-Shot Learners2021-09-03Code
15RoBERTa-large 355M + Entailment as Few-shot Learner86NoEntailment as Few-Shot Learner2021-04-29Code
16T5-Large 770M (fine-tuned)85.4NoExploring the Limits of Transfer Learning with a...2019-10-23Code
17LLaMA 65B (0-shot)85.3NoLLaMA: Open and Efficient Foundation Language Mo...2023-02-27Code
18LLaMA 2 70B (0-shot)85NoLlama 2: Open Foundation and Fine-Tuned Chat Mod...2023-07-18Code
19FLAN 137B (4-shot)84.6NoFinetuned Language Models Are Zero-Shot Learners2021-09-03Code
20MUPPET Roberta Base83.8NoMuppet: Massive Multi-task Representations with ...2021-01-26Code
21Chinchilla 70B (0-shot)83.7NoTraining Compute-Optimal Large Language Models2022-03-29Code
22LLaMA 2 34B (0-shot)83.7NoLlama 2: Open Foundation and Fine-Tuned Chat Mod...2023-07-18Code
23LLaMA 33B (0-shot)83.1NoLLaMA: Open and Efficient Foundation Language Mo...2023-02-27Code
24FLAN 137B (0-shot)82.9NoFinetuned Language Models Are Zero-Shot Learners2021-09-03Code
25LLaMA 2 13B (0-shot)81.7NoLlama 2: Open Foundation and Fine-Tuned Chat Mod...2023-07-18Code
26T5-Base 220M (fine-tuned)81.4NoExploring the Limits of Transfer Learning with a...2019-10-23Code
27BERT-MultiNLI 340M (fine-tuned)80.4NoBoolQ: Exploring the Surprising Difficulty of Na...2019-05-24Code
28Gopher (zero-shot)79.3NoScaling Language Models: Methods, Analysis & Ins...2021-12-08Code
29LLaMA 13B (zero-shot)78.1NoLLaMA: Open and Efficient Foundation Language Mo...2023-02-27Code
30LLaMA 2 7B (zero-shot)77.4NoLlama 2: Open Foundation and Fine-Tuned Chat Mod...2023-07-18Code
31LLaMA-2 13B + MixLoRA77.1NoMixLoRA: Enhancing Large Language Models Fine-Tu...2024-04-22Code
32LLaMA 7B (zero-shot)76.5NoLLaMA: Open and Efficient Foundation Language Mo...2023-02-27Code
33T5-Small 60M (fine-tuned)76.4NoExploring the Limits of Transfer Learning with a...2019-10-23Code
34GPT-3 175B (few-shot, k=32)76.4NoLanguage Models are Few-Shot Learners2020-05-28Code
35BiDAF-MultiNLI (fine-tuned)75.57NoBoolQ: Exploring the Surprising Difficulty of Na...2019-05-24Code
36LLaMA-3 8B + MixLoRA75NoMixLoRA: Enhancing Large Language Models Fine-Tu...2024-04-22Code
37Bloomberg GPT 50B (1-shot)74.6NoBloombergGPT: A Large Language Model for Finance2023-03-30Code
38LLaMA3+MoSLoRA74.6NoMixture-of-Subspaces in Low-Rank Adaptation2024-06-16Code
39GPT-1 117M (fine-tuned)72.87NoBoolQ: Exploring the Surprising Difficulty of Na...2019-05-24Code
40LLaMA-2 7B + MixLoRA72.7NoMixLoRA: Enhancing Large Language Models Fine-Tu...2024-04-22Code
41BiDAF + ELMo (fine-tuned)71.41NoBoolQ: Exploring the Surprising Difficulty of Na...2019-05-24Code
42OPT-IML 175B71.4NoOPT-IML: Scaling Language Model Instruction Meta...2022-12-22Code
43AlexaTM 20B69.4NoAlexaTM 20B: Few-Shot Learning Using a Large-Sca...2022-08-02Code
44Neo-6B (QA + WS)67.2NoAsk Me Anything: A simple strategy for prompting...2022-10-05Code
45OPT-IML 30B66.9NoOPT-IML: Scaling Language Model Instruction Meta...2022-12-22Code
46Neo-6B (few-shot)66.5NoAsk Me Anything: A simple strategy for prompting...2022-10-05Code
47N-Grammer 343M65NoN-Grammer: Augmenting Transformers with latent n...2022-07-13Code
48Neo-6B (QA)64.9NoAsk Me Anything: A simple strategy for prompting...2022-10-05Code
49OPT 30B (0-shot)64NoOPT-IML: Scaling Language Model Instruction Meta...2022-12-22Code
50UL2 20B (0-shot)63.1NoUL2: Unifying Language Learning Paradigms2022-05-10Code
51Majority baseline62.17NoBoolQ: Exploring the Surprising Difficulty of Na...2019-05-24Code
52Hybrid H3 1.3B (0-shot, logit scoring)61.7NoHungry Hungry Hippos: Towards Language Modeling ...2022-12-28Code
53OPT-IML 1.3B (0-shot)61.5NoOPT-IML: Scaling Language Model Instruction Meta...2022-12-22Code
54Shakti-LLM (2.5B)61.1NoSHAKTI: A 2.5 Billion Parameter Small Language M...2024-10-15-
55Hybrid H3 2.7B (3-shot, logit scoring)60.6NoHungry Hungry Hippos: Towards Language Modeling ...2022-12-28Code
56OPT 1.3B (zero-shot)60.5NoOPT-IML: Scaling Language Model Instruction Meta...2022-12-22Code
57GPT-3 75B (0-shot)60.5NoLanguage Models are Few-Shot Learners2020-05-28Code
58OPT 175B60.1NoOPT-IML: Scaling Language Model Instruction Meta...2022-12-22Code
59Hybrid H3 125M (0-shot, logit scoring)59.6NoHungry Hungry Hippos: Towards Language Modeling ...2022-12-28Code
60OPT 66B (1-shot)57.5NoBloombergGPT: A Large Language Model for Finance2023-03-30Code
61Hybrid H3 125M (3-shot, logit scoring)56.1NoHungry Hungry Hippos: Towards Language Modeling ...2022-12-28Code
62Hybrid H3 125M (3-shot, rank classification)56.1NoHungry Hungry Hippos: Towards Language Modeling ...2022-12-28Code
63BLOOM 176B (1-shot)52.9NoBloombergGPT: A Large Language Model for Finance2023-03-30Code
64Hyena51.8NoHyena Hierarchy: Towards Larger Convolutional La...2023-02-21Code
65GPT-NeoX 20B (1-shot)46.4NoBloombergGPT: A Large Language Model for Finance2023-03-30Code