Question Answering on TriviaQA

Metric: EM (higher is better)

LeaderboardDataset

Loading chart...

Results

Hide extra data

Sort:

#	Model↕	EM▼	Extra Data	Paper	Date↕	Code
1	Claude 2 (few-shot, k=5)	87.5	No	-	-	-
2	GPT-4-0613	87	No	-	-	-
3	Claude 1.3 (few-shot, k=5)	86.7	No	-	-	-
4	RankRAG-llama3-70b (Zero-Shot, KILT)	86.5	Yes	RankRAG: Unifying Context Ranking with Retrieval...	2024-07-02	-
5	PaLM 2-L (one-shot)	86.1	Yes	PaLM 2 Technical Report	2023-05-17	Code
6	ChatQA-1.5-llama3-70b (Zero-Shot, KILT)	85.6	Yes	ChatQA: Surpassing GPT-4 on Conversational QA an...	2024-01-18	-
7	LLaMA 2 70B (one-shot)	85	No	Llama 2: Open Foundation and Fine-Tuned Chat Mod...	2023-07-18	Code
8	GPT-4-0613 (Zero-shot)	84.8	No	GPT-4 Technical Report	2023-03-15	Code
9	RankRAG-llama3-8b (Zero-Shot, KILT)	82.9	Yes	RankRAG: Unifying Context Ranking with Retrieval...	2024-07-02	-
10	PaLM 2-M (one-shot)	81.7	No	PaLM 2 Technical Report	2023-05-17	Code
11	PaLM-540B (Few-Shot)	81.4	Yes	PaLM: Scaling Language Modeling with Pathways	2022-04-05	Code
12	PaLM-540B (One-Shot)	81.4	No	PaLM: Scaling Language Modeling with Pathways	2022-04-05	Code
13	ChatQA-1.5-llama3-8B (Zero-Shot, KILT)	81	Yes	ChatQA: Surpassing GPT-4 on Conversational QA an...	2024-01-18	-
14	GaC(Qwen2-72B-Instruct + Llama-3-70B-Instruct)	79.29	No	Breaking the Ceiling of the LLM Community by Tre...	2024-06-18	Code
15	Claude Instant 1.1 (few-shot, k=5)	78.9	No	-	-	-
16	code-davinci-002 175B + REPLUG LSR (Few-Shot)	77.3	No	REPLUG: Retrieval-Augmented Black-Box Language M...	2023-01-30	Code
17	PaLM-540B (Zero-Shot)	76.9	No	PaLM: Scaling Language Modeling with Pathways	2022-04-05	Code
18	code-davinci-002 175B + REPLUG (Few-Shot)	76.8	No	REPLUG: Retrieval-Augmented Black-Box Language M...	2023-01-30	Code
19	GLaM 62B/64E (One-shot)	75.8	Yes	GLaM: Efficient Scaling of Language Models with ...	2021-12-13	-
20	GLaM 62B/64E (Few-shot)	75.8	No	GLaM: Efficient Scaling of Language Models with ...	2021-12-13	-
21	RA-DIT (Zero-Shot)	75.4	Yes	RA-DIT: Retrieval-Augmented Dual Instruction Tun...	2023-10-02	-
22	PaLM 2-S (one-shot)	75.2	No	PaLM 2 Technical Report	2023-05-17	Code
23	LLaMA 65B (few-shot, k=64)	73	No	LLaMA: Open and Efficient Foundation Language Mo...	2023-02-27	Code
24	FiE+PAQ	72.6	No	FiE: Building a Global Probability Space by Leve...	2022-11-18	-
25	LLaMA 65B (few-shot, k=5)	72.6	No	LLaMA: Open and Efficient Foundation Language Mo...	2023-02-27	Code
26	RankRAG-llama3-70b (Zero-Shot, DPR)	72.6	Yes	RankRAG: Unifying Context Ranking with Retrieval...	2024-07-02	-
27	FiD+Distil	72.1	Yes	Distilling Knowledge from Reader to Retriever fo...	2020-12-08	Code
28	LLaMA 65B (one-shot)	71.6	No	LLaMA: Open and Efficient Foundation Language Mo...	2023-02-27	Code
29	EMDR2	71.4	No	End-to-End Training of Multi-Document Reader and...	2021-06-09	Code
30	GLaM 62B/64E (Zero-shot)	71.3	No	GLaM: Efficient Scaling of Language Models with ...	2021-12-13	-
31	GPT-3 175B (Few-Shot)	71.2	Yes	Language Models are Few-Shot Learners	2020-05-28	Code
32	Mistral 7B (5-shot)	69.9	No	Mistral 7B	2023-10-10	Code
33	ChatQA-1.5-llama3-70b (Zero-Shot, DPR)	69	No	ChatQA: Surpassing GPT-4 on Conversational QA an...	2024-01-18	-
34	LLaMA 65B (zero-shot)	68.2	No	LLaMA: Open and Efficient Foundation Language Mo...	2023-02-27	Code
35	Fusion-in-Decoder (large)	67.6	No	Leveraging Passage Retrieval with Generative Mod...	2020-07-02	Code
36	MemoReader	67.21	Yes	-	-	-
37	S-Norm	66.37	Yes	Simple and Effective Multi-Paragraph Reading Com...	2017-10-29	Code
38	TOME-2	65.8	No	Mention Memory: incorporating textual knowledge ...	2021-10-12	Code
39	Shakti-LLM (2.5B)	58.2	No	SHAKTI: A 2.5 Billion Parameter Small Language M...	2024-10-15	-
40	Branch-Train-MiX 4x7B (sampling top-2 experts)	57.1	No	Branch-Train-MiX: Mixing Expert LLMs into a Mixt...	2024-03-12	Code
41	DPR	56.8	No	Dense Passage Retrieval for Open-Domain Question...	2020-04-10	Code
42	FLAN 137B (zero-shot)	56.7	No	Finetuned Language Models Are Zero-Shot Learners	2021-09-03	Code
43	RAG	56.1	No	Retrieval-Augmented Generation for Knowledge-Int...	2020-05-22	Code
44	Reading Twice for NLU	50.56	No	Dynamic Integration of Background Knowledge in N...	2017-06-08	-
45	Mnemonic Reader	46.94	No	Reinforced Mnemonic Reader for Machine Reading C...	2017-05-08	Code
46	ORQA	45	No	Latent Retrieval for Weakly Supervised Open Doma...	2019-06-01	Code
47	MEMEN	43.16	No	MEMEN: Multi-layer Embedding with Memory Network...	2017-07-28	-

#1Claude 2 (few-shot, k=5)
87.5
EM
No paper
#2GPT-4-0613
87
EM
No paper
#3Claude 1.3 (few-shot, k=5)
86.7
EM
No paper
#4RankRAG-llama3-70b (Zero-Shot, KILT)SOTA
86.5
EM· Extra Data· 2024-07-02
RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs
#5PaLM 2-L (one-shot)SOTA
86.1
EM· Extra Data· 2023-05-17
PaLM 2 Technical Report Code
#6ChatQA-1.5-llama3-70b (Zero-Shot, KILT)
85.6
EM· Extra Data· 2024-01-18
ChatQA: Surpassing GPT-4 on Conversational QA and RAG
#7LLaMA 2 70B (one-shot)
85
EM· 2023-07-18
Llama 2: Open Foundation and Fine-Tuned Chat Models Code
#8GPT-4-0613 (Zero-shot)SOTA
84.8
EM· 2023-03-15
GPT-4 Technical Report Code
#9RankRAG-llama3-8b (Zero-Shot, KILT)
82.9
EM· Extra Data· 2024-07-02
RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs
#10PaLM 2-M (one-shot)
81.7
EM· 2023-05-17
PaLM 2 Technical Report Code
#11PaLM-540B (Few-Shot)SOTA
81.4
EM· Extra Data· 2022-04-05
PaLM: Scaling Language Modeling with Pathways Code
#12PaLM-540B (One-Shot)
81.4
EM· 2022-04-05
PaLM: Scaling Language Modeling with Pathways Code
#13ChatQA-1.5-llama3-8B (Zero-Shot, KILT)
81
EM· Extra Data· 2024-01-18
ChatQA: Surpassing GPT-4 on Conversational QA and RAG
#14GaC(Qwen2-72B-Instruct + Llama-3-70B-Instruct)
79.29
EM· 2024-06-18
Breaking the Ceiling of the LLM Community by Treating Token Generation as a Classification for Ensembling Code
#15Claude Instant 1.1 (few-shot, k=5)
78.9
EM
No paper
#16code-davinci-002 175B + REPLUG LSR (Few-Shot)
77.3
EM· 2023-01-30
REPLUG: Retrieval-Augmented Black-Box Language Models Code
#17PaLM-540B (Zero-Shot)
76.9
EM· 2022-04-05
PaLM: Scaling Language Modeling with Pathways Code
#18code-davinci-002 175B + REPLUG (Few-Shot)
76.8
EM· 2023-01-30
REPLUG: Retrieval-Augmented Black-Box Language Models Code
#19GLaM 62B/64E (One-shot)SOTA
75.8
EM· Extra Data· 2021-12-13
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
#20GLaM 62B/64E (Few-shot)
75.8
EM· 2021-12-13
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
#21RA-DIT (Zero-Shot)
75.4
EM· Extra Data· 2023-10-02
RA-DIT: Retrieval-Augmented Dual Instruction Tuning
#22PaLM 2-S (one-shot)
75.2
EM· 2023-05-17
PaLM 2 Technical Report Code
#23LLaMA 65B (few-shot, k=64)
73
EM· 2023-02-27
LLaMA: Open and Efficient Foundation Language Models Code
#24FiE+PAQ
72.6
EM· 2022-11-18
FiE: Building a Global Probability Space by Leveraging Early Fusion in Encoder for Open-Domain Question Answering
#25LLaMA 65B (few-shot, k=5)
72.6
EM· 2023-02-27
LLaMA: Open and Efficient Foundation Language Models Code
#26RankRAG-llama3-70b (Zero-Shot, DPR)
72.6
EM· Extra Data· 2024-07-02
RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs
#27FiD+DistilSOTA
72.1
EM· Extra Data· 2020-12-08
Distilling Knowledge from Reader to Retriever for Question Answering Code
#28LLaMA 65B (one-shot)
71.6
EM· 2023-02-27
LLaMA: Open and Efficient Foundation Language Models Code
#29EMDR2
71.4
EM· 2021-06-09
End-to-End Training of Multi-Document Reader and Retriever for Open-Domain Question Answering Code
#30GLaM 62B/64E (Zero-shot)
71.3
EM· 2021-12-13
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
#31GPT-3 175B (Few-Shot)SOTA
71.2
EM· Extra Data· 2020-05-28
Language Models are Few-Shot Learners Code
#32Mistral 7B (5-shot)
69.9
EM· 2023-10-10
Mistral 7B Code
#33ChatQA-1.5-llama3-70b (Zero-Shot, DPR)
69
EM· 2024-01-18
ChatQA: Surpassing GPT-4 on Conversational QA and RAG
#34LLaMA 65B (zero-shot)
68.2
EM· 2023-02-27
LLaMA: Open and Efficient Foundation Language Models Code
#35Fusion-in-Decoder (large)
67.6
EM· 2020-07-02
Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering Code
#36MemoReader
67.21
EM· Extra Data
No paper
#37S-NormSOTA
66.37
EM· Extra Data· 2017-10-29
Simple and Effective Multi-Paragraph Reading Comprehension Code
#38TOME-2
65.8
EM· 2021-10-12
Mention Memory: incorporating textual knowledge into Transformers through entity mention attention Code
#39Shakti-LLM (2.5B)
58.2
EM· 2024-10-15
SHAKTI: A 2.5 Billion Parameter Small Language Model Optimized for Edge AI and Low-Resource Environments
#40Branch-Train-MiX 4x7B (sampling top-2 experts)
57.1
EM· 2024-03-12
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM Code
#41DPR
56.8
EM· 2020-04-10
Dense Passage Retrieval for Open-Domain Question Answering Code
#42FLAN 137B (zero-shot)
56.7
EM· 2021-09-03
Finetuned Language Models Are Zero-Shot Learners Code
#43RAG
56.1
EM· 2020-05-22
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks Code
#44Reading Twice for NLUSOTA
50.56
EM· 2017-06-08
Dynamic Integration of Background Knowledge in Neural NLU Systems
#45Mnemonic ReaderSOTA
46.94
EM· 2017-05-08
Reinforced Mnemonic Reader for Machine Reading Comprehension Code
#46ORQA
45
EM· 2019-06-01
Latent Retrieval for Weakly Supervised Open Domain Question Answering Code
#47MEMEN
43.16
EM· 2017-07-28
MEMEN: Multi-layer Embedding with Memory Networks for Machine Comprehension