TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Natural Language Processing/Question Answering/Natural Questions

Question Answering on Natural Questions

Metric: EM (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕EM▼Extra DataPaperDate↕Code
1Atlas (full, Wiki-dec-2018 index)64NoAtlas: Few-shot Learning with Retrieval Augmente...2022-08-05Code
2Atlas (full, Wiki-dec-2021+CC index)60.4NoAtlas: Few-shot Learning with Retrieval Augmente...2022-08-05Code
3DPA-RAG59.19NoUnderstand What LLM Needs: Dual Preference Align...2024-06-26Code
4FiE58.4No0.8% Nyquist computational ghost imaging via non...2021-08-17-
5R2-D2 (full)55.9NoR2-D2: A Modular Baseline for Open-Domain Questi...2021-09-08Code
6ReAtt54.7NoRetrieval as Attention: End-to-end Learning of R...2022-12-05Code
7FiD-KD (full)54.7NoLeveraging Passage Retrieval with Generative Mod...2020-07-02Code
8RankRAG-llama3-70b (Zero-Shot, KILT)54.2YesRankRAG: Unifying Context Ranking with Retrieval...2024-07-02-
9EMDR^252.5NoEnd-to-End Training of Multi-Document Reader and...2021-06-09Code
10FID (full)51.4NoLeveraging Passage Retrieval with Generative Mod...2020-07-02Code
11RankRAG-llama3-8b (Zero-Shot, KILT)50.6YesRankRAG: Unifying Context Ranking with Retrieval...2024-07-02-
12RankRAG-llama3-70b (Zero-Shot, DPR)50YesRankRAG: Unifying Context Ranking with Retrieval...2024-07-02-
13ChatQA-1.5-llama3-70b (Zero-Shot, KILT)47YesChatQA: Surpassing GPT-4 on Conversational QA an...2024-01-18-
14RankRAG-llama3-8b (Zero-Shot, DPR)46.1YesRankRAG: Unifying Context Ranking with Retrieval...2024-07-02-
15RETRO + DPR (full)45.5NoImproving language models by retrieving from tri...2021-12-08Code
16code-davinci-002 175B + REPLUG LSR (few-shot)45.5NoREPLUG: Retrieval-Augmented Black-Box Language M...2023-01-30Code
17Atlas (few-shot, k=64, Wiki-Dec-2018 index)45.1NoAtlas: Few-shot Learning with Retrieval Augmente...2022-08-05Code
18code-davinci-002 175B + REPLUG (few-shot)44.7NoREPLUG: Retrieval-Augmented Black-Box Language M...2023-01-30Code
19RAG44.5NoRetrieval-Augmented Generation for Knowledge-Int...2020-05-22Code
20ChatQA-1.5-llama3-8b (Zero-Shot, KILT)42.7YesChatQA: Surpassing GPT-4 on Conversational QA an...2024-01-18-
21Blended RAG42.63NoBlended RAG: Improving RAG (Retriever-Augmented ...2024-03-22Code
22Atlas (few-shot, k=64, Wiki-dec-2021+CC index)42.4NoAtlas: Few-shot Learning with Retrieval Augmente...2022-08-05Code
23DPR41.5NoDense Passage Retrieval for Open-Domain Question...2020-04-10Code
24REALM40.4NoREALM: Retrieval-Augmented Language Model Pre-Tr...2020-02-10Code
25LLaMA 65B (few-shot, k=64)39.9NoLLaMA: Open and Efficient Foundation Language Mo...2023-02-27Code
26PaLM-540B (Few-Shot, k=64)39.6NoPaLM: Scaling Language Modeling with Pathways2022-04-05Code
27PaLM 2-L (one-shot)37.5NoPaLM 2 Technical Report2023-05-17Code
28Chinchilla (few-shot, k=64)35.5NoTraining Compute-Optimal Large Language Models2022-03-29Code
29LLaMA 65B (few-shot, k=5)35NoLLaMA: Open and Efficient Foundation Language Mo...2023-02-27Code
30Search-o134NoSearch-o1: Agentic Search-Enhanced Large Reasoni...2025-01-09Code
31LLaMA 2 70B (one-shot)33NoLlama 2: Open Foundation and Fine-Tuned Chat Mod...2023-07-18Code
32GLaM 62B/64E (Few-Shot)32.5NoGLaM: Efficient Scaling of Language Models with ...2021-12-13-
33PaLM 2-M (one-shot)32NoPaLM 2 Technical Report2023-05-17Code
34LLaMA 65B (one-shot)31NoLLaMA: Open and Efficient Foundation Language Mo...2023-02-27Code
35GPT-3 175B (Few-Shot, k=64)29.9NoLanguage Models are Few-Shot Learners2020-05-28Code
36PaLM-540B (One-Shot)29.3NoPaLM: Scaling Language Modeling with Pathways2022-04-05Code
37Mistral 7B (5-shot)28.8NoMistral 7B2023-10-10Code
38Gopher (few-shot, k=64)28.2NoScaling Language Models: Methods, Analysis & Ins...2021-12-08Code
39GLaM 62B/64E (One-Shot)26.3NoGLaM: Efficient Scaling of Language Models with ...2021-12-13-
40LLaMA 7B (Contriever)26.07No---
41PaLM 2-S (one-shot)25.3NoPaLM 2 Technical Report2023-05-17Code
42LLaMA 33B (zero-shot)24.9NoLLaMA: Open and Efficient Foundation Language Mo...2023-02-27Code
43GLaM 62B/64E (Zero-Shot)24.7NoGLaM: Efficient Scaling of Language Models with ...2021-12-13-
44PaLM-540B (Zero-Shot)21.2NoPaLM: Scaling Language Modeling with Pathways2022-04-05Code
45Neo-6B (QA)19.7NoAsk Me Anything: A simple strategy for prompting...2022-10-05Code
46Neo-6B (QA + WS)19.6NoAsk Me Anything: A simple strategy for prompting...2022-10-05Code
47Neo-6B (Few-Shot)13.7NoAsk Me Anything: A simple strategy for prompting...2022-10-05Code