Question Answering on BoolQ

Metric: Accuracy (higher is better)

LeaderboardDataset

Loading chart...

Results

#	Model↕	Accuracy▼	Extra Data	Paper	Date↕	Code
1	Mistral-Nemo 12B (HPT)	99.87	No	Hierarchical Prompting Taxonomy: A Universal Eva...	2024-06-18	Code
2	ST-MoE-32B 269B (fine-tuned)	92.4	No	ST-MoE: Designing Stable and Transferable Sparse...	2022-02-17	Code
3	PaLM 540B (fine-tuned)	92.2	No	PaLM: Scaling Language Modeling with Pathways	2022-04-05	Code
4	Turing NLR v5 XXL 5.4B (fine-tuned)	92	No	Toward Efficient Language Model Pretraining and ...	2022-12-04	-
5	T5-XXL 11B (fine-tuned)	91.2	No	Exploring the Limits of Transfer Learning with a...	2019-10-23	Code
6	PaLM 2-L (1-shot)	90.9	No	PaLM 2 Technical Report	2023-05-17	Code
7	UL2 20B (fine-tuned)	90.8	No	UL2: Unifying Language Learning Paradigms	2022-05-10	Code
8	Vega v2 6B (fine-tuned)	90.5	No	Toward Efficient Language Model Pretraining and ...	2022-12-04	-
9	DeBERTa-1.5B	90.4	No	DeBERTa: Decoding-enhanced BERT with Disentangle...	2020-06-05	Code
10	PaLM 2-M (1-shot)	88.6	No	PaLM 2 Technical Report	2023-05-17	Code
11	ST-MoE-L 4.1B (fine-tuned)	88.6	No	ST-MoE: Designing Stable and Transferable Sparse...	2022-02-17	Code
12	PaLM 2-S (1-shot)	88.1	No	PaLM 2 Technical Report	2023-05-17	Code
13	MUPPET Roberta Large	87.5	No	Muppet: Massive Multi-task Representations with ...	2021-01-26	Code
14	FLAN 137B (prompt-tuned)	86.3	No	Finetuned Language Models Are Zero-Shot Learners	2021-09-03	Code
15	RoBERTa-large 355M + Entailment as Few-shot Learner	86	No	Entailment as Few-Shot Learner	2021-04-29	Code
16	T5-Large 770M (fine-tuned)	85.4	No	Exploring the Limits of Transfer Learning with a...	2019-10-23	Code
17	LLaMA 65B (0-shot)	85.3	No	LLaMA: Open and Efficient Foundation Language Mo...	2023-02-27	Code
18	LLaMA 2 70B (0-shot)	85	No	Llama 2: Open Foundation and Fine-Tuned Chat Mod...	2023-07-18	Code
19	FLAN 137B (4-shot)	84.6	No	Finetuned Language Models Are Zero-Shot Learners	2021-09-03	Code
20	MUPPET Roberta Base	83.8	No	Muppet: Massive Multi-task Representations with ...	2021-01-26	Code
21	Chinchilla 70B (0-shot)	83.7	No	Training Compute-Optimal Large Language Models	2022-03-29	Code
22	LLaMA 2 34B (0-shot)	83.7	No	Llama 2: Open Foundation and Fine-Tuned Chat Mod...	2023-07-18	Code
23	LLaMA 33B (0-shot)	83.1	No	LLaMA: Open and Efficient Foundation Language Mo...	2023-02-27	Code
24	FLAN 137B (0-shot)	82.9	No	Finetuned Language Models Are Zero-Shot Learners	2021-09-03	Code
25	LLaMA 2 13B (0-shot)	81.7	No	Llama 2: Open Foundation and Fine-Tuned Chat Mod...	2023-07-18	Code
26	T5-Base 220M (fine-tuned)	81.4	No	Exploring the Limits of Transfer Learning with a...	2019-10-23	Code
27	BERT-MultiNLI 340M (fine-tuned)	80.4	No	BoolQ: Exploring the Surprising Difficulty of Na...	2019-05-24	Code
28	Gopher (zero-shot)	79.3	No	Scaling Language Models: Methods, Analysis & Ins...	2021-12-08	Code
29	LLaMA 13B (zero-shot)	78.1	No	LLaMA: Open and Efficient Foundation Language Mo...	2023-02-27	Code
30	LLaMA 2 7B (zero-shot)	77.4	No	Llama 2: Open Foundation and Fine-Tuned Chat Mod...	2023-07-18	Code
31	LLaMA-2 13B + MixLoRA	77.1	No	MixLoRA: Enhancing Large Language Models Fine-Tu...	2024-04-22	Code
32	LLaMA 7B (zero-shot)	76.5	No	LLaMA: Open and Efficient Foundation Language Mo...	2023-02-27	Code
33	T5-Small 60M (fine-tuned)	76.4	No	Exploring the Limits of Transfer Learning with a...	2019-10-23	Code
34	GPT-3 175B (few-shot, k=32)	76.4	No	Language Models are Few-Shot Learners	2020-05-28	Code
35	BiDAF-MultiNLI (fine-tuned)	75.57	No	BoolQ: Exploring the Surprising Difficulty of Na...	2019-05-24	Code
36	LLaMA-3 8B + MixLoRA	75	No	MixLoRA: Enhancing Large Language Models Fine-Tu...	2024-04-22	Code
37	Bloomberg GPT 50B (1-shot)	74.6	No	BloombergGPT: A Large Language Model for Finance	2023-03-30	Code
38	LLaMA3+MoSLoRA	74.6	No	Mixture-of-Subspaces in Low-Rank Adaptation	2024-06-16	Code
39	GPT-1 117M (fine-tuned)	72.87	No	BoolQ: Exploring the Surprising Difficulty of Na...	2019-05-24	Code
40	LLaMA-2 7B + MixLoRA	72.7	No	MixLoRA: Enhancing Large Language Models Fine-Tu...	2024-04-22	Code
41	BiDAF + ELMo (fine-tuned)	71.41	No	BoolQ: Exploring the Surprising Difficulty of Na...	2019-05-24	Code
42	OPT-IML 175B	71.4	No	OPT-IML: Scaling Language Model Instruction Meta...	2022-12-22	Code
43	AlexaTM 20B	69.4	No	AlexaTM 20B: Few-Shot Learning Using a Large-Sca...	2022-08-02	Code
44	Neo-6B (QA + WS)	67.2	No	Ask Me Anything: A simple strategy for prompting...	2022-10-05	Code
45	OPT-IML 30B	66.9	No	OPT-IML: Scaling Language Model Instruction Meta...	2022-12-22	Code
46	Neo-6B (few-shot)	66.5	No	Ask Me Anything: A simple strategy for prompting...	2022-10-05	Code
47	N-Grammer 343M	65	No	N-Grammer: Augmenting Transformers with latent n...	2022-07-13	Code
48	Neo-6B (QA)	64.9	No	Ask Me Anything: A simple strategy for prompting...	2022-10-05	Code
49	OPT 30B (0-shot)	64	No	OPT-IML: Scaling Language Model Instruction Meta...	2022-12-22	Code
50	UL2 20B (0-shot)	63.1	No	UL2: Unifying Language Learning Paradigms	2022-05-10	Code
51	Majority baseline	62.17	No	BoolQ: Exploring the Surprising Difficulty of Na...	2019-05-24	Code
52	Hybrid H3 1.3B (0-shot, logit scoring)	61.7	No	Hungry Hungry Hippos: Towards Language Modeling ...	2022-12-28	Code
53	OPT-IML 1.3B (0-shot)	61.5	No	OPT-IML: Scaling Language Model Instruction Meta...	2022-12-22	Code
54	Shakti-LLM (2.5B)	61.1	No	SHAKTI: A 2.5 Billion Parameter Small Language M...	2024-10-15	-
55	Hybrid H3 2.7B (3-shot, logit scoring)	60.6	No	Hungry Hungry Hippos: Towards Language Modeling ...	2022-12-28	Code
56	OPT 1.3B (zero-shot)	60.5	No	OPT-IML: Scaling Language Model Instruction Meta...	2022-12-22	Code
57	GPT-3 75B (0-shot)	60.5	No	Language Models are Few-Shot Learners	2020-05-28	Code
58	OPT 175B	60.1	No	OPT-IML: Scaling Language Model Instruction Meta...	2022-12-22	Code
59	Hybrid H3 125M (0-shot, logit scoring)	59.6	No	Hungry Hungry Hippos: Towards Language Modeling ...	2022-12-28	Code
60	OPT 66B (1-shot)	57.5	No	BloombergGPT: A Large Language Model for Finance	2023-03-30	Code
61	Hybrid H3 125M (3-shot, logit scoring)	56.1	No	Hungry Hungry Hippos: Towards Language Modeling ...	2022-12-28	Code
62	Hybrid H3 125M (3-shot, rank classification)	56.1	No	Hungry Hungry Hippos: Towards Language Modeling ...	2022-12-28	Code
63	BLOOM 176B (1-shot)	52.9	No	BloombergGPT: A Large Language Model for Finance	2023-03-30	Code
64	Hyena	51.8	No	Hyena Hierarchy: Towards Larger Convolutional La...	2023-02-21	Code
65	GPT-NeoX 20B (1-shot)	46.4	No	BloombergGPT: A Large Language Model for Finance	2023-03-30	Code