TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Natural Language Processing/Question Answering/TriviaQA

Question Answering on TriviaQA

Metric: EM (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕EM▼Extra DataPaperDate↕Code
1Claude 2 (few-shot, k=5)87.5No---
2GPT-4-061387No---
3Claude 1.3 (few-shot, k=5)86.7No---
4RankRAG-llama3-70b (Zero-Shot, KILT)86.5YesRankRAG: Unifying Context Ranking with Retrieval...2024-07-02-
5PaLM 2-L (one-shot)86.1YesPaLM 2 Technical Report2023-05-17Code
6ChatQA-1.5-llama3-70b (Zero-Shot, KILT)85.6YesChatQA: Surpassing GPT-4 on Conversational QA an...2024-01-18-
7LLaMA 2 70B (one-shot)85NoLlama 2: Open Foundation and Fine-Tuned Chat Mod...2023-07-18Code
8GPT-4-0613 (Zero-shot)84.8NoGPT-4 Technical Report2023-03-15Code
9RankRAG-llama3-8b (Zero-Shot, KILT)82.9YesRankRAG: Unifying Context Ranking with Retrieval...2024-07-02-
10PaLM 2-M (one-shot)81.7NoPaLM 2 Technical Report2023-05-17Code
11PaLM-540B (Few-Shot)81.4YesPaLM: Scaling Language Modeling with Pathways2022-04-05Code
12PaLM-540B (One-Shot)81.4NoPaLM: Scaling Language Modeling with Pathways2022-04-05Code
13ChatQA-1.5-llama3-8B (Zero-Shot, KILT)81YesChatQA: Surpassing GPT-4 on Conversational QA an...2024-01-18-
14GaC(Qwen2-72B-Instruct + Llama-3-70B-Instruct)79.29NoBreaking the Ceiling of the LLM Community by Tre...2024-06-18Code
15Claude Instant 1.1 (few-shot, k=5)78.9No---
16code-davinci-002 175B + REPLUG LSR (Few-Shot)77.3NoREPLUG: Retrieval-Augmented Black-Box Language M...2023-01-30Code
17PaLM-540B (Zero-Shot)76.9NoPaLM: Scaling Language Modeling with Pathways2022-04-05Code
18code-davinci-002 175B + REPLUG (Few-Shot)76.8NoREPLUG: Retrieval-Augmented Black-Box Language M...2023-01-30Code
19GLaM 62B/64E (One-shot)75.8YesGLaM: Efficient Scaling of Language Models with ...2021-12-13-
20GLaM 62B/64E (Few-shot)75.8NoGLaM: Efficient Scaling of Language Models with ...2021-12-13-
21RA-DIT (Zero-Shot)75.4YesRA-DIT: Retrieval-Augmented Dual Instruction Tun...2023-10-02-
22PaLM 2-S (one-shot)75.2NoPaLM 2 Technical Report2023-05-17Code
23LLaMA 65B (few-shot, k=64)73NoLLaMA: Open and Efficient Foundation Language Mo...2023-02-27Code
24FiE+PAQ72.6NoFiE: Building a Global Probability Space by Leve...2022-11-18-
25LLaMA 65B (few-shot, k=5)72.6NoLLaMA: Open and Efficient Foundation Language Mo...2023-02-27Code
26RankRAG-llama3-70b (Zero-Shot, DPR)72.6YesRankRAG: Unifying Context Ranking with Retrieval...2024-07-02-
27FiD+Distil72.1YesDistilling Knowledge from Reader to Retriever fo...2020-12-08Code
28LLaMA 65B (one-shot)71.6NoLLaMA: Open and Efficient Foundation Language Mo...2023-02-27Code
29EMDR271.4NoEnd-to-End Training of Multi-Document Reader and...2021-06-09Code
30GLaM 62B/64E (Zero-shot)71.3NoGLaM: Efficient Scaling of Language Models with ...2021-12-13-
31GPT-3 175B (Few-Shot)71.2YesLanguage Models are Few-Shot Learners2020-05-28Code
32Mistral 7B (5-shot)69.9NoMistral 7B2023-10-10Code
33ChatQA-1.5-llama3-70b (Zero-Shot, DPR)69NoChatQA: Surpassing GPT-4 on Conversational QA an...2024-01-18-
34LLaMA 65B (zero-shot)68.2NoLLaMA: Open and Efficient Foundation Language Mo...2023-02-27Code
35Fusion-in-Decoder (large)67.6NoLeveraging Passage Retrieval with Generative Mod...2020-07-02Code
36MemoReader67.21Yes---
37S-Norm66.37YesSimple and Effective Multi-Paragraph Reading Com...2017-10-29Code
38TOME-265.8NoMention Memory: incorporating textual knowledge ...2021-10-12Code
39Shakti-LLM (2.5B)58.2NoSHAKTI: A 2.5 Billion Parameter Small Language M...2024-10-15-
40Branch-Train-MiX 4x7B (sampling top-2 experts)57.1NoBranch-Train-MiX: Mixing Expert LLMs into a Mixt...2024-03-12Code
41DPR56.8NoDense Passage Retrieval for Open-Domain Question...2020-04-10Code
42FLAN 137B (zero-shot)56.7NoFinetuned Language Models Are Zero-Shot Learners2021-09-03Code
43RAG56.1NoRetrieval-Augmented Generation for Knowledge-Int...2020-05-22Code
44Reading Twice for NLU50.56NoDynamic Integration of Background Knowledge in N...2017-06-08-
45Mnemonic Reader46.94NoReinforced Mnemonic Reader for Machine Reading C...2017-05-08Code
46ORQA45NoLatent Retrieval for Weakly Supervised Open Doma...2019-06-01Code
47MEMEN43.16NoMEMEN: Multi-layer Embedding with Memory Network...2017-07-28-