| 1 | Claude 2 (few-shot, k=5) | 87.5 | No | - | - | - |
| 2 | GPT-4-0613 | 87 | No | - | - | - |
| 3 | Claude 1.3 (few-shot, k=5) | 86.7 | No | - | - | - |
| 4 | RankRAG-llama3-70b (Zero-Shot, KILT) | 86.5 | Yes | RankRAG: Unifying Context Ranking with Retrieval... | 2024-07-02 | - |
| 5 | PaLM 2-L (one-shot) | 86.1 | Yes | PaLM 2 Technical Report | 2023-05-17 | Code |
| 6 | ChatQA-1.5-llama3-70b (Zero-Shot, KILT) | 85.6 | Yes | ChatQA: Surpassing GPT-4 on Conversational QA an... | 2024-01-18 | - |
| 7 | LLaMA 2 70B (one-shot) | 85 | No | Llama 2: Open Foundation and Fine-Tuned Chat Mod... | 2023-07-18 | Code |
| 8 | GPT-4-0613 (Zero-shot) | 84.8 | No | GPT-4 Technical Report | 2023-03-15 | Code |
| 9 | RankRAG-llama3-8b (Zero-Shot, KILT) | 82.9 | Yes | RankRAG: Unifying Context Ranking with Retrieval... | 2024-07-02 | - |
| 10 | PaLM 2-M (one-shot) | 81.7 | No | PaLM 2 Technical Report | 2023-05-17 | Code |
| 11 | PaLM-540B (Few-Shot) | 81.4 | Yes | PaLM: Scaling Language Modeling with Pathways | 2022-04-05 | Code |
| 12 | PaLM-540B (One-Shot) | 81.4 | No | PaLM: Scaling Language Modeling with Pathways | 2022-04-05 | Code |
| 13 | ChatQA-1.5-llama3-8B (Zero-Shot, KILT) | 81 | Yes | ChatQA: Surpassing GPT-4 on Conversational QA an... | 2024-01-18 | - |
| 14 | GaC(Qwen2-72B-Instruct + Llama-3-70B-Instruct) | 79.29 | No | Breaking the Ceiling of the LLM Community by Tre... | 2024-06-18 | Code |
| 15 | Claude Instant 1.1 (few-shot, k=5) | 78.9 | No | - | - | - |
| 16 | code-davinci-002 175B + REPLUG LSR (Few-Shot) | 77.3 | No | REPLUG: Retrieval-Augmented Black-Box Language M... | 2023-01-30 | Code |
| 17 | PaLM-540B (Zero-Shot) | 76.9 | No | PaLM: Scaling Language Modeling with Pathways | 2022-04-05 | Code |
| 18 | code-davinci-002 175B + REPLUG (Few-Shot) | 76.8 | No | REPLUG: Retrieval-Augmented Black-Box Language M... | 2023-01-30 | Code |
| 19 | GLaM 62B/64E (One-shot) | 75.8 | Yes | GLaM: Efficient Scaling of Language Models with ... | 2021-12-13 | - |
| 20 | GLaM 62B/64E (Few-shot) | 75.8 | No | GLaM: Efficient Scaling of Language Models with ... | 2021-12-13 | - |
| 21 | RA-DIT (Zero-Shot) | 75.4 | Yes | RA-DIT: Retrieval-Augmented Dual Instruction Tun... | 2023-10-02 | - |
| 22 | PaLM 2-S (one-shot) | 75.2 | No | PaLM 2 Technical Report | 2023-05-17 | Code |
| 23 | LLaMA 65B (few-shot, k=64) | 73 | No | LLaMA: Open and Efficient Foundation Language Mo... | 2023-02-27 | Code |
| 24 | FiE+PAQ | 72.6 | No | FiE: Building a Global Probability Space by Leve... | 2022-11-18 | - |
| 25 | LLaMA 65B (few-shot, k=5) | 72.6 | No | LLaMA: Open and Efficient Foundation Language Mo... | 2023-02-27 | Code |
| 26 | RankRAG-llama3-70b (Zero-Shot, DPR) | 72.6 | Yes | RankRAG: Unifying Context Ranking with Retrieval... | 2024-07-02 | - |
| 27 | FiD+Distil | 72.1 | Yes | Distilling Knowledge from Reader to Retriever fo... | 2020-12-08 | Code |
| 28 | LLaMA 65B (one-shot) | 71.6 | No | LLaMA: Open and Efficient Foundation Language Mo... | 2023-02-27 | Code |
| 29 | EMDR2 | 71.4 | No | End-to-End Training of Multi-Document Reader and... | 2021-06-09 | Code |
| 30 | GLaM 62B/64E (Zero-shot) | 71.3 | No | GLaM: Efficient Scaling of Language Models with ... | 2021-12-13 | - |
| 31 | GPT-3 175B (Few-Shot) | 71.2 | Yes | Language Models are Few-Shot Learners | 2020-05-28 | Code |
| 32 | Mistral 7B (5-shot) | 69.9 | No | Mistral 7B | 2023-10-10 | Code |
| 33 | ChatQA-1.5-llama3-70b (Zero-Shot, DPR) | 69 | No | ChatQA: Surpassing GPT-4 on Conversational QA an... | 2024-01-18 | - |
| 34 | LLaMA 65B (zero-shot) | 68.2 | No | LLaMA: Open and Efficient Foundation Language Mo... | 2023-02-27 | Code |
| 35 | Fusion-in-Decoder (large) | 67.6 | No | Leveraging Passage Retrieval with Generative Mod... | 2020-07-02 | Code |
| 36 | MemoReader | 67.21 | Yes | - | - | - |
| 37 | S-Norm | 66.37 | Yes | Simple and Effective Multi-Paragraph Reading Com... | 2017-10-29 | Code |
| 38 | TOME-2 | 65.8 | No | Mention Memory: incorporating textual knowledge ... | 2021-10-12 | Code |
| 39 | Shakti-LLM (2.5B) | 58.2 | No | SHAKTI: A 2.5 Billion Parameter Small Language M... | 2024-10-15 | - |
| 40 | Branch-Train-MiX 4x7B (sampling top-2 experts) | 57.1 | No | Branch-Train-MiX: Mixing Expert LLMs into a Mixt... | 2024-03-12 | Code |
| 41 | DPR | 56.8 | No | Dense Passage Retrieval for Open-Domain Question... | 2020-04-10 | Code |
| 42 | FLAN 137B (zero-shot) | 56.7 | No | Finetuned Language Models Are Zero-Shot Learners | 2021-09-03 | Code |
| 43 | RAG | 56.1 | No | Retrieval-Augmented Generation for Knowledge-Int... | 2020-05-22 | Code |
| 44 | Reading Twice for NLU | 50.56 | No | Dynamic Integration of Background Knowledge in N... | 2017-06-08 | - |
| 45 | Mnemonic Reader | 46.94 | No | Reinforced Mnemonic Reader for Machine Reading C... | 2017-05-08 | Code |
| 46 | ORQA | 45 | No | Latent Retrieval for Weakly Supervised Open Doma... | 2019-06-01 | Code |
| 47 | MEMEN | 43.16 | No | MEMEN: Multi-layer Embedding with Memory Network... | 2017-07-28 | - |