Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Question Answering
/
Natural Questions
Question Answering on Natural Questions
Metric: EM (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
Sort:
EM (best first)
EM (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
EM
▼
Extra Data
Paper
Date
↕
Code
1
Atlas (full, Wiki-dec-2018 index)
64
No
Atlas: Few-shot Learning with Retrieval Augmente...
2022-08-05
Code
2
Atlas (full, Wiki-dec-2021+CC index)
60.4
No
Atlas: Few-shot Learning with Retrieval Augmente...
2022-08-05
Code
3
DPA-RAG
59.19
No
Understand What LLM Needs: Dual Preference Align...
2024-06-26
Code
4
FiE
58.4
No
0.8% Nyquist computational ghost imaging via non...
2021-08-17
-
5
R2-D2 (full)
55.9
No
R2-D2: A Modular Baseline for Open-Domain Questi...
2021-09-08
Code
6
ReAtt
54.7
No
Retrieval as Attention: End-to-end Learning of R...
2022-12-05
Code
7
FiD-KD (full)
54.7
No
Leveraging Passage Retrieval with Generative Mod...
2020-07-02
Code
8
RankRAG-llama3-70b (Zero-Shot, KILT)
54.2
Yes
RankRAG: Unifying Context Ranking with Retrieval...
2024-07-02
-
9
EMDR^2
52.5
No
End-to-End Training of Multi-Document Reader and...
2021-06-09
Code
10
FID (full)
51.4
No
Leveraging Passage Retrieval with Generative Mod...
2020-07-02
Code
11
RankRAG-llama3-8b (Zero-Shot, KILT)
50.6
Yes
RankRAG: Unifying Context Ranking with Retrieval...
2024-07-02
-
12
RankRAG-llama3-70b (Zero-Shot, DPR)
50
Yes
RankRAG: Unifying Context Ranking with Retrieval...
2024-07-02
-
13
ChatQA-1.5-llama3-70b (Zero-Shot, KILT)
47
Yes
ChatQA: Surpassing GPT-4 on Conversational QA an...
2024-01-18
-
14
RankRAG-llama3-8b (Zero-Shot, DPR)
46.1
Yes
RankRAG: Unifying Context Ranking with Retrieval...
2024-07-02
-
15
RETRO + DPR (full)
45.5
No
Improving language models by retrieving from tri...
2021-12-08
Code
16
code-davinci-002 175B + REPLUG LSR (few-shot)
45.5
No
REPLUG: Retrieval-Augmented Black-Box Language M...
2023-01-30
Code
17
Atlas (few-shot, k=64, Wiki-Dec-2018 index)
45.1
No
Atlas: Few-shot Learning with Retrieval Augmente...
2022-08-05
Code
18
code-davinci-002 175B + REPLUG (few-shot)
44.7
No
REPLUG: Retrieval-Augmented Black-Box Language M...
2023-01-30
Code
19
RAG
44.5
No
Retrieval-Augmented Generation for Knowledge-Int...
2020-05-22
Code
20
ChatQA-1.5-llama3-8b (Zero-Shot, KILT)
42.7
Yes
ChatQA: Surpassing GPT-4 on Conversational QA an...
2024-01-18
-
21
Blended RAG
42.63
No
Blended RAG: Improving RAG (Retriever-Augmented ...
2024-03-22
Code
22
Atlas (few-shot, k=64, Wiki-dec-2021+CC index)
42.4
No
Atlas: Few-shot Learning with Retrieval Augmente...
2022-08-05
Code
23
DPR
41.5
No
Dense Passage Retrieval for Open-Domain Question...
2020-04-10
Code
24
REALM
40.4
No
REALM: Retrieval-Augmented Language Model Pre-Tr...
2020-02-10
Code
25
LLaMA 65B (few-shot, k=64)
39.9
No
LLaMA: Open and Efficient Foundation Language Mo...
2023-02-27
Code
26
PaLM-540B (Few-Shot, k=64)
39.6
No
PaLM: Scaling Language Modeling with Pathways
2022-04-05
Code
27
PaLM 2-L (one-shot)
37.5
No
PaLM 2 Technical Report
2023-05-17
Code
28
Chinchilla (few-shot, k=64)
35.5
No
Training Compute-Optimal Large Language Models
2022-03-29
Code
29
LLaMA 65B (few-shot, k=5)
35
No
LLaMA: Open and Efficient Foundation Language Mo...
2023-02-27
Code
30
Search-o1
34
No
Search-o1: Agentic Search-Enhanced Large Reasoni...
2025-01-09
Code
31
LLaMA 2 70B (one-shot)
33
No
Llama 2: Open Foundation and Fine-Tuned Chat Mod...
2023-07-18
Code
32
GLaM 62B/64E (Few-Shot)
32.5
No
GLaM: Efficient Scaling of Language Models with ...
2021-12-13
-
33
PaLM 2-M (one-shot)
32
No
PaLM 2 Technical Report
2023-05-17
Code
34
LLaMA 65B (one-shot)
31
No
LLaMA: Open and Efficient Foundation Language Mo...
2023-02-27
Code
35
GPT-3 175B (Few-Shot, k=64)
29.9
No
Language Models are Few-Shot Learners
2020-05-28
Code
36
PaLM-540B (One-Shot)
29.3
No
PaLM: Scaling Language Modeling with Pathways
2022-04-05
Code
37
Mistral 7B (5-shot)
28.8
No
Mistral 7B
2023-10-10
Code
38
Gopher (few-shot, k=64)
28.2
No
Scaling Language Models: Methods, Analysis & Ins...
2021-12-08
Code
39
GLaM 62B/64E (One-Shot)
26.3
No
GLaM: Efficient Scaling of Language Models with ...
2021-12-13
-
40
LLaMA 7B (Contriever)
26.07
No
-
-
-
41
PaLM 2-S (one-shot)
25.3
No
PaLM 2 Technical Report
2023-05-17
Code
42
LLaMA 33B (zero-shot)
24.9
No
LLaMA: Open and Efficient Foundation Language Mo...
2023-02-27
Code
43
GLaM 62B/64E (Zero-Shot)
24.7
No
GLaM: Efficient Scaling of Language Models with ...
2021-12-13
-
44
PaLM-540B (Zero-Shot)
21.2
No
PaLM: Scaling Language Modeling with Pathways
2022-04-05
Code
45
Neo-6B (QA)
19.7
No
Ask Me Anything: A simple strategy for prompting...
2022-10-05
Code
46
Neo-6B (QA + WS)
19.6
No
Ask Me Anything: A simple strategy for prompting...
2022-10-05
Code
47
Neo-6B (Few-Shot)
13.7
No
Ask Me Anything: A simple strategy for prompting...
2022-10-05
Code
#1
Atlas (full, Wiki-dec-2018 index)
SOTA
64
EM
· 2022-08-05
Atlas: Few-shot Learning with Retrieval Augmented Language Models
Code
#2
Atlas (full, Wiki-dec-2021+CC index)
60.4
EM
· 2022-08-05
Atlas: Few-shot Learning with Retrieval Augmented Language Models
Code
#3
DPA-RAG
59.19
EM
· 2024-06-26
Understand What LLM Needs: Dual Preference Alignment for Retrieval-Augmented Generation
Code
#4
FiE
SOTA
58.4
EM
· 2021-08-17
0.8% Nyquist computational ghost imaging via non-experimental deep learning
#5
R2-D2 (full)
55.9
EM
· 2021-09-08
R2-D2: A Modular Baseline for Open-Domain Question Answering
Code
#6
ReAtt
54.7
EM
· 2022-12-05
Retrieval as Attention: End-to-end Learning of Retrieval and Reading within a Single Transformer
Code
#7
FiD-KD (full)
SOTA
54.7
EM
· 2020-07-02
Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering
Code
#8
RankRAG-llama3-70b (Zero-Shot, KILT)
54.2
EM
· Extra Data
· 2024-07-02
RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs
#9
EMDR^2
52.5
EM
· 2021-06-09
End-to-End Training of Multi-Document Reader and Retriever for Open-Domain Question Answering
Code
#10
FID (full)
51.4
EM
· 2020-07-02
Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering
Code
#11
RankRAG-llama3-8b (Zero-Shot, KILT)
50.6
EM
· Extra Data
· 2024-07-02
RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs
#12
RankRAG-llama3-70b (Zero-Shot, DPR)
50
EM
· Extra Data
· 2024-07-02
RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs
#13
ChatQA-1.5-llama3-70b (Zero-Shot, KILT)
47
EM
· Extra Data
· 2024-01-18
ChatQA: Surpassing GPT-4 on Conversational QA and RAG
#14
RankRAG-llama3-8b (Zero-Shot, DPR)
46.1
EM
· Extra Data
· 2024-07-02
RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs
#15
RETRO + DPR (full)
45.5
EM
· 2021-12-08
Improving language models by retrieving from trillions of tokens
Code
#16
code-davinci-002 175B + REPLUG LSR (few-shot)
45.5
EM
· 2023-01-30
REPLUG: Retrieval-Augmented Black-Box Language Models
Code
#17
Atlas (few-shot, k=64, Wiki-Dec-2018 index)
45.1
EM
· 2022-08-05
Atlas: Few-shot Learning with Retrieval Augmented Language Models
Code
#18
code-davinci-002 175B + REPLUG (few-shot)
44.7
EM
· 2023-01-30
REPLUG: Retrieval-Augmented Black-Box Language Models
Code
#19
RAG
SOTA
44.5
EM
· 2020-05-22
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Code
#20
ChatQA-1.5-llama3-8b (Zero-Shot, KILT)
42.7
EM
· Extra Data
· 2024-01-18
ChatQA: Surpassing GPT-4 on Conversational QA and RAG
#21
Blended RAG
42.63
EM
· 2024-03-22
Blended RAG: Improving RAG (Retriever-Augmented Generation) Accuracy with Semantic Search and Hybrid Query-Based Retrievers
Code
#22
Atlas (few-shot, k=64, Wiki-dec-2021+CC index)
42.4
EM
· 2022-08-05
Atlas: Few-shot Learning with Retrieval Augmented Language Models
Code
#23
DPR
SOTA
41.5
EM
· 2020-04-10
Dense Passage Retrieval for Open-Domain Question Answering
Code
#24
REALM
SOTA
40.4
EM
· 2020-02-10
REALM: Retrieval-Augmented Language Model Pre-Training
Code
#25
LLaMA 65B (few-shot, k=64)
39.9
EM
· 2023-02-27
LLaMA: Open and Efficient Foundation Language Models
Code
#26
PaLM-540B (Few-Shot, k=64)
39.6
EM
· 2022-04-05
PaLM: Scaling Language Modeling with Pathways
Code
#27
PaLM 2-L (one-shot)
37.5
EM
· 2023-05-17
PaLM 2 Technical Report
Code
#28
Chinchilla (few-shot, k=64)
35.5
EM
· 2022-03-29
Training Compute-Optimal Large Language Models
Code
#29
LLaMA 65B (few-shot, k=5)
35
EM
· 2023-02-27
LLaMA: Open and Efficient Foundation Language Models
Code
#30
Search-o1
34
EM
· 2025-01-09
Search-o1: Agentic Search-Enhanced Large Reasoning Models
Code
#31
LLaMA 2 70B (one-shot)
33
EM
· 2023-07-18
Llama 2: Open Foundation and Fine-Tuned Chat Models
Code
#32
GLaM 62B/64E (Few-Shot)
32.5
EM
· 2021-12-13
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
#33
PaLM 2-M (one-shot)
32
EM
· 2023-05-17
PaLM 2 Technical Report
Code
#34
LLaMA 65B (one-shot)
31
EM
· 2023-02-27
LLaMA: Open and Efficient Foundation Language Models
Code
#35
GPT-3 175B (Few-Shot, k=64)
29.9
EM
· 2020-05-28
Language Models are Few-Shot Learners
Code
#36
PaLM-540B (One-Shot)
29.3
EM
· 2022-04-05
PaLM: Scaling Language Modeling with Pathways
Code
#37
Mistral 7B (5-shot)
28.8
EM
· 2023-10-10
Mistral 7B
Code
#38
Gopher (few-shot, k=64)
28.2
EM
· 2021-12-08
Scaling Language Models: Methods, Analysis & Insights from Training Gopher
Code
#39
GLaM 62B/64E (One-Shot)
26.3
EM
· 2021-12-13
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
#40
LLaMA 7B (Contriever)
26.07
EM
No paper
#41
PaLM 2-S (one-shot)
25.3
EM
· 2023-05-17
PaLM 2 Technical Report
Code
#42
LLaMA 33B (zero-shot)
24.9
EM
· 2023-02-27
LLaMA: Open and Efficient Foundation Language Models
Code
#43
GLaM 62B/64E (Zero-Shot)
24.7
EM
· 2021-12-13
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
#44
PaLM-540B (Zero-Shot)
21.2
EM
· 2022-04-05
PaLM: Scaling Language Modeling with Pathways
Code
#45
Neo-6B (QA)
19.7
EM
· 2022-10-05
Ask Me Anything: A simple strategy for prompting language models
Code
#46
Neo-6B (QA + WS)
19.6
EM
· 2022-10-05
Ask Me Anything: A simple strategy for prompting language models
Code
#47
Neo-6B (Few-Shot)
13.7
EM
· 2022-10-05
Ask Me Anything: A simple strategy for prompting language models
Code