Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Question Answering
/
TriviaQA
Question Answering on TriviaQA
Metric: EM (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
Sort:
EM (best first)
EM (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
EM
▼
Extra Data
Paper
Date
↕
Code
1
Claude 2 (few-shot, k=5)
87.5
No
-
-
-
2
GPT-4-0613
87
No
-
-
-
3
Claude 1.3 (few-shot, k=5)
86.7
No
-
-
-
4
RankRAG-llama3-70b (Zero-Shot, KILT)
86.5
Yes
RankRAG: Unifying Context Ranking with Retrieval...
2024-07-02
-
5
PaLM 2-L (one-shot)
86.1
Yes
PaLM 2 Technical Report
2023-05-17
Code
6
ChatQA-1.5-llama3-70b (Zero-Shot, KILT)
85.6
Yes
ChatQA: Surpassing GPT-4 on Conversational QA an...
2024-01-18
-
7
LLaMA 2 70B (one-shot)
85
No
Llama 2: Open Foundation and Fine-Tuned Chat Mod...
2023-07-18
Code
8
GPT-4-0613 (Zero-shot)
84.8
No
GPT-4 Technical Report
2023-03-15
Code
9
RankRAG-llama3-8b (Zero-Shot, KILT)
82.9
Yes
RankRAG: Unifying Context Ranking with Retrieval...
2024-07-02
-
10
PaLM 2-M (one-shot)
81.7
No
PaLM 2 Technical Report
2023-05-17
Code
11
PaLM-540B (Few-Shot)
81.4
Yes
PaLM: Scaling Language Modeling with Pathways
2022-04-05
Code
12
PaLM-540B (One-Shot)
81.4
No
PaLM: Scaling Language Modeling with Pathways
2022-04-05
Code
13
ChatQA-1.5-llama3-8B (Zero-Shot, KILT)
81
Yes
ChatQA: Surpassing GPT-4 on Conversational QA an...
2024-01-18
-
14
GaC(Qwen2-72B-Instruct + Llama-3-70B-Instruct)
79.29
No
Breaking the Ceiling of the LLM Community by Tre...
2024-06-18
Code
15
Claude Instant 1.1 (few-shot, k=5)
78.9
No
-
-
-
16
code-davinci-002 175B + REPLUG LSR (Few-Shot)
77.3
No
REPLUG: Retrieval-Augmented Black-Box Language M...
2023-01-30
Code
17
PaLM-540B (Zero-Shot)
76.9
No
PaLM: Scaling Language Modeling with Pathways
2022-04-05
Code
18
code-davinci-002 175B + REPLUG (Few-Shot)
76.8
No
REPLUG: Retrieval-Augmented Black-Box Language M...
2023-01-30
Code
19
GLaM 62B/64E (One-shot)
75.8
Yes
GLaM: Efficient Scaling of Language Models with ...
2021-12-13
-
20
GLaM 62B/64E (Few-shot)
75.8
No
GLaM: Efficient Scaling of Language Models with ...
2021-12-13
-
21
RA-DIT (Zero-Shot)
75.4
Yes
RA-DIT: Retrieval-Augmented Dual Instruction Tun...
2023-10-02
-
22
PaLM 2-S (one-shot)
75.2
No
PaLM 2 Technical Report
2023-05-17
Code
23
LLaMA 65B (few-shot, k=64)
73
No
LLaMA: Open and Efficient Foundation Language Mo...
2023-02-27
Code
24
FiE+PAQ
72.6
No
FiE: Building a Global Probability Space by Leve...
2022-11-18
-
25
LLaMA 65B (few-shot, k=5)
72.6
No
LLaMA: Open and Efficient Foundation Language Mo...
2023-02-27
Code
26
RankRAG-llama3-70b (Zero-Shot, DPR)
72.6
Yes
RankRAG: Unifying Context Ranking with Retrieval...
2024-07-02
-
27
FiD+Distil
72.1
Yes
Distilling Knowledge from Reader to Retriever fo...
2020-12-08
Code
28
LLaMA 65B (one-shot)
71.6
No
LLaMA: Open and Efficient Foundation Language Mo...
2023-02-27
Code
29
EMDR2
71.4
No
End-to-End Training of Multi-Document Reader and...
2021-06-09
Code
30
GLaM 62B/64E (Zero-shot)
71.3
No
GLaM: Efficient Scaling of Language Models with ...
2021-12-13
-
31
GPT-3 175B (Few-Shot)
71.2
Yes
Language Models are Few-Shot Learners
2020-05-28
Code
32
Mistral 7B (5-shot)
69.9
No
Mistral 7B
2023-10-10
Code
33
ChatQA-1.5-llama3-70b (Zero-Shot, DPR)
69
No
ChatQA: Surpassing GPT-4 on Conversational QA an...
2024-01-18
-
34
LLaMA 65B (zero-shot)
68.2
No
LLaMA: Open and Efficient Foundation Language Mo...
2023-02-27
Code
35
Fusion-in-Decoder (large)
67.6
No
Leveraging Passage Retrieval with Generative Mod...
2020-07-02
Code
36
MemoReader
67.21
Yes
-
-
-
37
S-Norm
66.37
Yes
Simple and Effective Multi-Paragraph Reading Com...
2017-10-29
Code
38
TOME-2
65.8
No
Mention Memory: incorporating textual knowledge ...
2021-10-12
Code
39
Shakti-LLM (2.5B)
58.2
No
SHAKTI: A 2.5 Billion Parameter Small Language M...
2024-10-15
-
40
Branch-Train-MiX 4x7B (sampling top-2 experts)
57.1
No
Branch-Train-MiX: Mixing Expert LLMs into a Mixt...
2024-03-12
Code
41
DPR
56.8
No
Dense Passage Retrieval for Open-Domain Question...
2020-04-10
Code
42
FLAN 137B (zero-shot)
56.7
No
Finetuned Language Models Are Zero-Shot Learners
2021-09-03
Code
43
RAG
56.1
No
Retrieval-Augmented Generation for Knowledge-Int...
2020-05-22
Code
44
Reading Twice for NLU
50.56
No
Dynamic Integration of Background Knowledge in N...
2017-06-08
-
45
Mnemonic Reader
46.94
No
Reinforced Mnemonic Reader for Machine Reading C...
2017-05-08
Code
46
ORQA
45
No
Latent Retrieval for Weakly Supervised Open Doma...
2019-06-01
Code
47
MEMEN
43.16
No
MEMEN: Multi-layer Embedding with Memory Network...
2017-07-28
-
#1
Claude 2 (few-shot, k=5)
87.5
EM
No paper
#2
GPT-4-0613
87
EM
No paper
#3
Claude 1.3 (few-shot, k=5)
86.7
EM
No paper
#4
RankRAG-llama3-70b (Zero-Shot, KILT)
SOTA
86.5
EM
· Extra Data
· 2024-07-02
RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs
#5
PaLM 2-L (one-shot)
SOTA
86.1
EM
· Extra Data
· 2023-05-17
PaLM 2 Technical Report
Code
#6
ChatQA-1.5-llama3-70b (Zero-Shot, KILT)
85.6
EM
· Extra Data
· 2024-01-18
ChatQA: Surpassing GPT-4 on Conversational QA and RAG
#7
LLaMA 2 70B (one-shot)
85
EM
· 2023-07-18
Llama 2: Open Foundation and Fine-Tuned Chat Models
Code
#8
GPT-4-0613 (Zero-shot)
SOTA
84.8
EM
· 2023-03-15
GPT-4 Technical Report
Code
#9
RankRAG-llama3-8b (Zero-Shot, KILT)
82.9
EM
· Extra Data
· 2024-07-02
RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs
#10
PaLM 2-M (one-shot)
81.7
EM
· 2023-05-17
PaLM 2 Technical Report
Code
#11
PaLM-540B (Few-Shot)
SOTA
81.4
EM
· Extra Data
· 2022-04-05
PaLM: Scaling Language Modeling with Pathways
Code
#12
PaLM-540B (One-Shot)
81.4
EM
· 2022-04-05
PaLM: Scaling Language Modeling with Pathways
Code
#13
ChatQA-1.5-llama3-8B (Zero-Shot, KILT)
81
EM
· Extra Data
· 2024-01-18
ChatQA: Surpassing GPT-4 on Conversational QA and RAG
#14
GaC(Qwen2-72B-Instruct + Llama-3-70B-Instruct)
79.29
EM
· 2024-06-18
Breaking the Ceiling of the LLM Community by Treating Token Generation as a Classification for Ensembling
Code
#15
Claude Instant 1.1 (few-shot, k=5)
78.9
EM
No paper
#16
code-davinci-002 175B + REPLUG LSR (Few-Shot)
77.3
EM
· 2023-01-30
REPLUG: Retrieval-Augmented Black-Box Language Models
Code
#17
PaLM-540B (Zero-Shot)
76.9
EM
· 2022-04-05
PaLM: Scaling Language Modeling with Pathways
Code
#18
code-davinci-002 175B + REPLUG (Few-Shot)
76.8
EM
· 2023-01-30
REPLUG: Retrieval-Augmented Black-Box Language Models
Code
#19
GLaM 62B/64E (One-shot)
SOTA
75.8
EM
· Extra Data
· 2021-12-13
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
#20
GLaM 62B/64E (Few-shot)
75.8
EM
· 2021-12-13
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
#21
RA-DIT (Zero-Shot)
75.4
EM
· Extra Data
· 2023-10-02
RA-DIT: Retrieval-Augmented Dual Instruction Tuning
#22
PaLM 2-S (one-shot)
75.2
EM
· 2023-05-17
PaLM 2 Technical Report
Code
#23
LLaMA 65B (few-shot, k=64)
73
EM
· 2023-02-27
LLaMA: Open and Efficient Foundation Language Models
Code
#24
FiE+PAQ
72.6
EM
· 2022-11-18
FiE: Building a Global Probability Space by Leveraging Early Fusion in Encoder for Open-Domain Question Answering
#25
LLaMA 65B (few-shot, k=5)
72.6
EM
· 2023-02-27
LLaMA: Open and Efficient Foundation Language Models
Code
#26
RankRAG-llama3-70b (Zero-Shot, DPR)
72.6
EM
· Extra Data
· 2024-07-02
RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs
#27
FiD+Distil
SOTA
72.1
EM
· Extra Data
· 2020-12-08
Distilling Knowledge from Reader to Retriever for Question Answering
Code
#28
LLaMA 65B (one-shot)
71.6
EM
· 2023-02-27
LLaMA: Open and Efficient Foundation Language Models
Code
#29
EMDR2
71.4
EM
· 2021-06-09
End-to-End Training of Multi-Document Reader and Retriever for Open-Domain Question Answering
Code
#30
GLaM 62B/64E (Zero-shot)
71.3
EM
· 2021-12-13
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
#31
GPT-3 175B (Few-Shot)
SOTA
71.2
EM
· Extra Data
· 2020-05-28
Language Models are Few-Shot Learners
Code
#32
Mistral 7B (5-shot)
69.9
EM
· 2023-10-10
Mistral 7B
Code
#33
ChatQA-1.5-llama3-70b (Zero-Shot, DPR)
69
EM
· 2024-01-18
ChatQA: Surpassing GPT-4 on Conversational QA and RAG
#34
LLaMA 65B (zero-shot)
68.2
EM
· 2023-02-27
LLaMA: Open and Efficient Foundation Language Models
Code
#35
Fusion-in-Decoder (large)
67.6
EM
· 2020-07-02
Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering
Code
#36
MemoReader
67.21
EM
· Extra Data
No paper
#37
S-Norm
SOTA
66.37
EM
· Extra Data
· 2017-10-29
Simple and Effective Multi-Paragraph Reading Comprehension
Code
#38
TOME-2
65.8
EM
· 2021-10-12
Mention Memory: incorporating textual knowledge into Transformers through entity mention attention
Code
#39
Shakti-LLM (2.5B)
58.2
EM
· 2024-10-15
SHAKTI: A 2.5 Billion Parameter Small Language Model Optimized for Edge AI and Low-Resource Environments
#40
Branch-Train-MiX 4x7B (sampling top-2 experts)
57.1
EM
· 2024-03-12
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
Code
#41
DPR
56.8
EM
· 2020-04-10
Dense Passage Retrieval for Open-Domain Question Answering
Code
#42
FLAN 137B (zero-shot)
56.7
EM
· 2021-09-03
Finetuned Language Models Are Zero-Shot Learners
Code
#43
RAG
56.1
EM
· 2020-05-22
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Code
#44
Reading Twice for NLU
SOTA
50.56
EM
· 2017-06-08
Dynamic Integration of Background Knowledge in Neural NLU Systems
#45
Mnemonic Reader
SOTA
46.94
EM
· 2017-05-08
Reinforced Mnemonic Reader for Machine Reading Comprehension
Code
#46
ORQA
45
EM
· 2019-06-01
Latent Retrieval for Weakly Supervised Open Domain Question Answering
Code
#47
MEMEN
43.16
EM
· 2017-07-28
MEMEN: Multi-layer Embedding with Memory Networks for Machine Comprehension