Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Question Answering
/
BoolQ
Question Answering on BoolQ
Metric: Accuracy (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
Sort:
Accuracy (best first)
Accuracy (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
Accuracy
▼
Extra Data
Paper
Date
↕
Code
1
Mistral-Nemo 12B (HPT)
99.87
No
Hierarchical Prompting Taxonomy: A Universal Eva...
2024-06-18
Code
2
ST-MoE-32B 269B (fine-tuned)
92.4
No
ST-MoE: Designing Stable and Transferable Sparse...
2022-02-17
Code
3
PaLM 540B (fine-tuned)
92.2
No
PaLM: Scaling Language Modeling with Pathways
2022-04-05
Code
4
Turing NLR v5 XXL 5.4B (fine-tuned)
92
No
Toward Efficient Language Model Pretraining and ...
2022-12-04
-
5
T5-XXL 11B (fine-tuned)
91.2
No
Exploring the Limits of Transfer Learning with a...
2019-10-23
Code
6
PaLM 2-L (1-shot)
90.9
No
PaLM 2 Technical Report
2023-05-17
Code
7
UL2 20B (fine-tuned)
90.8
No
UL2: Unifying Language Learning Paradigms
2022-05-10
Code
8
Vega v2 6B (fine-tuned)
90.5
No
Toward Efficient Language Model Pretraining and ...
2022-12-04
-
9
DeBERTa-1.5B
90.4
No
DeBERTa: Decoding-enhanced BERT with Disentangle...
2020-06-05
Code
10
PaLM 2-M (1-shot)
88.6
No
PaLM 2 Technical Report
2023-05-17
Code
11
ST-MoE-L 4.1B (fine-tuned)
88.6
No
ST-MoE: Designing Stable and Transferable Sparse...
2022-02-17
Code
12
PaLM 2-S (1-shot)
88.1
No
PaLM 2 Technical Report
2023-05-17
Code
13
MUPPET Roberta Large
87.5
No
Muppet: Massive Multi-task Representations with ...
2021-01-26
Code
14
FLAN 137B (prompt-tuned)
86.3
No
Finetuned Language Models Are Zero-Shot Learners
2021-09-03
Code
15
RoBERTa-large 355M + Entailment as Few-shot Learner
86
No
Entailment as Few-Shot Learner
2021-04-29
Code
16
T5-Large 770M (fine-tuned)
85.4
No
Exploring the Limits of Transfer Learning with a...
2019-10-23
Code
17
LLaMA 65B (0-shot)
85.3
No
LLaMA: Open and Efficient Foundation Language Mo...
2023-02-27
Code
18
LLaMA 2 70B (0-shot)
85
No
Llama 2: Open Foundation and Fine-Tuned Chat Mod...
2023-07-18
Code
19
FLAN 137B (4-shot)
84.6
No
Finetuned Language Models Are Zero-Shot Learners
2021-09-03
Code
20
MUPPET Roberta Base
83.8
No
Muppet: Massive Multi-task Representations with ...
2021-01-26
Code
21
Chinchilla 70B (0-shot)
83.7
No
Training Compute-Optimal Large Language Models
2022-03-29
Code
22
LLaMA 2 34B (0-shot)
83.7
No
Llama 2: Open Foundation and Fine-Tuned Chat Mod...
2023-07-18
Code
23
LLaMA 33B (0-shot)
83.1
No
LLaMA: Open and Efficient Foundation Language Mo...
2023-02-27
Code
24
FLAN 137B (0-shot)
82.9
No
Finetuned Language Models Are Zero-Shot Learners
2021-09-03
Code
25
LLaMA 2 13B (0-shot)
81.7
No
Llama 2: Open Foundation and Fine-Tuned Chat Mod...
2023-07-18
Code
26
T5-Base 220M (fine-tuned)
81.4
No
Exploring the Limits of Transfer Learning with a...
2019-10-23
Code
27
BERT-MultiNLI 340M (fine-tuned)
80.4
No
BoolQ: Exploring the Surprising Difficulty of Na...
2019-05-24
Code
28
Gopher (zero-shot)
79.3
No
Scaling Language Models: Methods, Analysis & Ins...
2021-12-08
Code
29
LLaMA 13B (zero-shot)
78.1
No
LLaMA: Open and Efficient Foundation Language Mo...
2023-02-27
Code
30
LLaMA 2 7B (zero-shot)
77.4
No
Llama 2: Open Foundation and Fine-Tuned Chat Mod...
2023-07-18
Code
31
LLaMA-2 13B + MixLoRA
77.1
No
MixLoRA: Enhancing Large Language Models Fine-Tu...
2024-04-22
Code
32
LLaMA 7B (zero-shot)
76.5
No
LLaMA: Open and Efficient Foundation Language Mo...
2023-02-27
Code
33
T5-Small 60M (fine-tuned)
76.4
No
Exploring the Limits of Transfer Learning with a...
2019-10-23
Code
34
GPT-3 175B (few-shot, k=32)
76.4
No
Language Models are Few-Shot Learners
2020-05-28
Code
35
BiDAF-MultiNLI (fine-tuned)
75.57
No
BoolQ: Exploring the Surprising Difficulty of Na...
2019-05-24
Code
36
LLaMA-3 8B + MixLoRA
75
No
MixLoRA: Enhancing Large Language Models Fine-Tu...
2024-04-22
Code
37
Bloomberg GPT 50B (1-shot)
74.6
No
BloombergGPT: A Large Language Model for Finance
2023-03-30
Code
38
LLaMA3+MoSLoRA
74.6
No
Mixture-of-Subspaces in Low-Rank Adaptation
2024-06-16
Code
39
GPT-1 117M (fine-tuned)
72.87
No
BoolQ: Exploring the Surprising Difficulty of Na...
2019-05-24
Code
40
LLaMA-2 7B + MixLoRA
72.7
No
MixLoRA: Enhancing Large Language Models Fine-Tu...
2024-04-22
Code
41
BiDAF + ELMo (fine-tuned)
71.41
No
BoolQ: Exploring the Surprising Difficulty of Na...
2019-05-24
Code
42
OPT-IML 175B
71.4
No
OPT-IML: Scaling Language Model Instruction Meta...
2022-12-22
Code
43
AlexaTM 20B
69.4
No
AlexaTM 20B: Few-Shot Learning Using a Large-Sca...
2022-08-02
Code
44
Neo-6B (QA + WS)
67.2
No
Ask Me Anything: A simple strategy for prompting...
2022-10-05
Code
45
OPT-IML 30B
66.9
No
OPT-IML: Scaling Language Model Instruction Meta...
2022-12-22
Code
46
Neo-6B (few-shot)
66.5
No
Ask Me Anything: A simple strategy for prompting...
2022-10-05
Code
47
N-Grammer 343M
65
No
N-Grammer: Augmenting Transformers with latent n...
2022-07-13
Code
48
Neo-6B (QA)
64.9
No
Ask Me Anything: A simple strategy for prompting...
2022-10-05
Code
49
OPT 30B (0-shot)
64
No
OPT-IML: Scaling Language Model Instruction Meta...
2022-12-22
Code
50
UL2 20B (0-shot)
63.1
No
UL2: Unifying Language Learning Paradigms
2022-05-10
Code
51
Majority baseline
62.17
No
BoolQ: Exploring the Surprising Difficulty of Na...
2019-05-24
Code
52
Hybrid H3 1.3B (0-shot, logit scoring)
61.7
No
Hungry Hungry Hippos: Towards Language Modeling ...
2022-12-28
Code
53
OPT-IML 1.3B (0-shot)
61.5
No
OPT-IML: Scaling Language Model Instruction Meta...
2022-12-22
Code
54
Shakti-LLM (2.5B)
61.1
No
SHAKTI: A 2.5 Billion Parameter Small Language M...
2024-10-15
-
55
Hybrid H3 2.7B (3-shot, logit scoring)
60.6
No
Hungry Hungry Hippos: Towards Language Modeling ...
2022-12-28
Code
56
OPT 1.3B (zero-shot)
60.5
No
OPT-IML: Scaling Language Model Instruction Meta...
2022-12-22
Code
57
GPT-3 75B (0-shot)
60.5
No
Language Models are Few-Shot Learners
2020-05-28
Code
58
OPT 175B
60.1
No
OPT-IML: Scaling Language Model Instruction Meta...
2022-12-22
Code
59
Hybrid H3 125M (0-shot, logit scoring)
59.6
No
Hungry Hungry Hippos: Towards Language Modeling ...
2022-12-28
Code
60
OPT 66B (1-shot)
57.5
No
BloombergGPT: A Large Language Model for Finance
2023-03-30
Code
61
Hybrid H3 125M (3-shot, logit scoring)
56.1
No
Hungry Hungry Hippos: Towards Language Modeling ...
2022-12-28
Code
62
Hybrid H3 125M (3-shot, rank classification)
56.1
No
Hungry Hungry Hippos: Towards Language Modeling ...
2022-12-28
Code
63
BLOOM 176B (1-shot)
52.9
No
BloombergGPT: A Large Language Model for Finance
2023-03-30
Code
64
Hyena
51.8
No
Hyena Hierarchy: Towards Larger Convolutional La...
2023-02-21
Code
65
GPT-NeoX 20B (1-shot)
46.4
No
BloombergGPT: A Large Language Model for Finance
2023-03-30
Code
#1
Mistral-Nemo 12B (HPT)
SOTA
99.87
Accuracy
· 2024-06-18
Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models Aligned with Human Cognitive Principles
Code
#2
ST-MoE-32B 269B (fine-tuned)
SOTA
92.4
Accuracy
· 2022-02-17
ST-MoE: Designing Stable and Transferable Sparse Expert Models
Code
#3
PaLM 540B (fine-tuned)
92.2
Accuracy
· 2022-04-05
PaLM: Scaling Language Modeling with Pathways
Code
#4
Turing NLR v5 XXL 5.4B (fine-tuned)
92
Accuracy
· 2022-12-04
Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE
#5
T5-XXL 11B (fine-tuned)
SOTA
91.2
Accuracy
· 2019-10-23
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Code
#6
PaLM 2-L (1-shot)
90.9
Accuracy
· 2023-05-17
PaLM 2 Technical Report
Code
#7
UL2 20B (fine-tuned)
90.8
Accuracy
· 2022-05-10
UL2: Unifying Language Learning Paradigms
Code
#8
Vega v2 6B (fine-tuned)
90.5
Accuracy
· 2022-12-04
Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE
#9
DeBERTa-1.5B
90.4
Accuracy
· 2020-06-05
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
Code
#10
PaLM 2-M (1-shot)
88.6
Accuracy
· 2023-05-17
PaLM 2 Technical Report
Code
#11
ST-MoE-L 4.1B (fine-tuned)
88.6
Accuracy
· 2022-02-17
ST-MoE: Designing Stable and Transferable Sparse Expert Models
Code
#12
PaLM 2-S (1-shot)
88.1
Accuracy
· 2023-05-17
PaLM 2 Technical Report
Code
#13
MUPPET Roberta Large
87.5
Accuracy
· 2021-01-26
Muppet: Massive Multi-task Representations with Pre-Finetuning
Code
#14
FLAN 137B (prompt-tuned)
86.3
Accuracy
· 2021-09-03
Finetuned Language Models Are Zero-Shot Learners
Code
#15
RoBERTa-large 355M + Entailment as Few-shot Learner
86
Accuracy
· 2021-04-29
Entailment as Few-Shot Learner
Code
#16
T5-Large 770M (fine-tuned)
85.4
Accuracy
· 2019-10-23
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Code
#17
LLaMA 65B (0-shot)
85.3
Accuracy
· 2023-02-27
LLaMA: Open and Efficient Foundation Language Models
Code
#18
LLaMA 2 70B (0-shot)
85
Accuracy
· 2023-07-18
Llama 2: Open Foundation and Fine-Tuned Chat Models
Code
#19
FLAN 137B (4-shot)
84.6
Accuracy
· 2021-09-03
Finetuned Language Models Are Zero-Shot Learners
Code
#20
MUPPET Roberta Base
83.8
Accuracy
· 2021-01-26
Muppet: Massive Multi-task Representations with Pre-Finetuning
Code
#21
Chinchilla 70B (0-shot)
83.7
Accuracy
· 2022-03-29
Training Compute-Optimal Large Language Models
Code
#22
LLaMA 2 34B (0-shot)
83.7
Accuracy
· 2023-07-18
Llama 2: Open Foundation and Fine-Tuned Chat Models
Code
#23
LLaMA 33B (0-shot)
83.1
Accuracy
· 2023-02-27
LLaMA: Open and Efficient Foundation Language Models
Code
#24
FLAN 137B (0-shot)
82.9
Accuracy
· 2021-09-03
Finetuned Language Models Are Zero-Shot Learners
Code
#25
LLaMA 2 13B (0-shot)
81.7
Accuracy
· 2023-07-18
Llama 2: Open Foundation and Fine-Tuned Chat Models
Code
#26
T5-Base 220M (fine-tuned)
81.4
Accuracy
· 2019-10-23
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Code
#27
BERT-MultiNLI 340M (fine-tuned)
SOTA
80.4
Accuracy
· 2019-05-24
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions
Code
#28
Gopher (zero-shot)
79.3
Accuracy
· 2021-12-08
Scaling Language Models: Methods, Analysis & Insights from Training Gopher
Code
#29
LLaMA 13B (zero-shot)
78.1
Accuracy
· 2023-02-27
LLaMA: Open and Efficient Foundation Language Models
Code
#30
LLaMA 2 7B (zero-shot)
77.4
Accuracy
· 2023-07-18
Llama 2: Open Foundation and Fine-Tuned Chat Models
Code
#31
LLaMA-2 13B + MixLoRA
77.1
Accuracy
· 2024-04-22
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts
Code
#32
LLaMA 7B (zero-shot)
76.5
Accuracy
· 2023-02-27
LLaMA: Open and Efficient Foundation Language Models
Code
#33
T5-Small 60M (fine-tuned)
76.4
Accuracy
· 2019-10-23
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Code
#34
GPT-3 175B (few-shot, k=32)
76.4
Accuracy
· 2020-05-28
Language Models are Few-Shot Learners
Code
#35
BiDAF-MultiNLI (fine-tuned)
75.57
Accuracy
· 2019-05-24
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions
Code
#36
LLaMA-3 8B + MixLoRA
75
Accuracy
· 2024-04-22
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts
Code
#37
Bloomberg GPT 50B (1-shot)
74.6
Accuracy
· 2023-03-30
BloombergGPT: A Large Language Model for Finance
Code
#38
LLaMA3+MoSLoRA
74.6
Accuracy
· 2024-06-16
Mixture-of-Subspaces in Low-Rank Adaptation
Code
#39
GPT-1 117M (fine-tuned)
72.87
Accuracy
· 2019-05-24
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions
Code
#40
LLaMA-2 7B + MixLoRA
72.7
Accuracy
· 2024-04-22
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts
Code
#41
BiDAF + ELMo (fine-tuned)
71.41
Accuracy
· 2019-05-24
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions
Code
#42
OPT-IML 175B
71.4
Accuracy
· 2022-12-22
OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization
Code
#43
AlexaTM 20B
69.4
Accuracy
· 2022-08-02
AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model
Code
#44
Neo-6B (QA + WS)
67.2
Accuracy
· 2022-10-05
Ask Me Anything: A simple strategy for prompting language models
Code
#45
OPT-IML 30B
66.9
Accuracy
· 2022-12-22
OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization
Code
#46
Neo-6B (few-shot)
66.5
Accuracy
· 2022-10-05
Ask Me Anything: A simple strategy for prompting language models
Code
#47
N-Grammer 343M
65
Accuracy
· 2022-07-13
N-Grammer: Augmenting Transformers with latent n-grams
Code
#48
Neo-6B (QA)
64.9
Accuracy
· 2022-10-05
Ask Me Anything: A simple strategy for prompting language models
Code
#49
OPT 30B (0-shot)
64
Accuracy
· 2022-12-22
OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization
Code
#50
UL2 20B (0-shot)
63.1
Accuracy
· 2022-05-10
UL2: Unifying Language Learning Paradigms
Code
#51
Majority baseline
62.17
Accuracy
· 2019-05-24
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions
Code
#52
Hybrid H3 1.3B (0-shot, logit scoring)
61.7
Accuracy
· 2022-12-28
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
Code
#53
OPT-IML 1.3B (0-shot)
61.5
Accuracy
· 2022-12-22
OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization
Code
#54
Shakti-LLM (2.5B)
61.1
Accuracy
· 2024-10-15
SHAKTI: A 2.5 Billion Parameter Small Language Model Optimized for Edge AI and Low-Resource Environments
#55
Hybrid H3 2.7B (3-shot, logit scoring)
60.6
Accuracy
· 2022-12-28
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
Code
#56
OPT 1.3B (zero-shot)
60.5
Accuracy
· 2022-12-22
OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization
Code
#57
GPT-3 75B (0-shot)
60.5
Accuracy
· 2020-05-28
Language Models are Few-Shot Learners
Code
#58
OPT 175B
60.1
Accuracy
· 2022-12-22
OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization
Code
#59
Hybrid H3 125M (0-shot, logit scoring)
59.6
Accuracy
· 2022-12-28
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
Code
#60
OPT 66B (1-shot)
57.5
Accuracy
· 2023-03-30
BloombergGPT: A Large Language Model for Finance
Code
#61
Hybrid H3 125M (3-shot, logit scoring)
56.1
Accuracy
· 2022-12-28
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
Code
#62
Hybrid H3 125M (3-shot, rank classification)
56.1
Accuracy
· 2022-12-28
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
Code
#63
BLOOM 176B (1-shot)
52.9
Accuracy
· 2023-03-30
BloombergGPT: A Large Language Model for Finance
Code
#64
Hyena
51.8
Accuracy
· 2023-02-21
Hyena Hierarchy: Towards Larger Convolutional Language Models
Code
#65
GPT-NeoX 20B (1-shot)
46.4
Accuracy
· 2023-03-30
BloombergGPT: A Large Language Model for Finance
Code