TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/KILT: a Benchmark for Knowledge Intensive Language Tasks

KILT: a Benchmark for Knowledge Intensive Language Tasks

Fabio Petroni, Aleksandra Piktus, Angela Fan, Patrick Lewis, Majid Yazdani, Nicola De Cao, James Thorne, Yacine Jernite, Vladimir Karpukhin, Jean Maillard, Vassilis Plachouras, Tim Rocktäschel, Sebastian Riedel

2020-09-04NAACL 2021 4Question AnsweringEntity LinkingFact CheckingSlot FillingOpen-Domain Question AnsweringOpen-Domain DialogFact Verification
PaperPDFCodeCode(official)Code

Abstract

Challenging problems such as open-domain question answering, fact checking, slot filling and entity linking require access to large, external knowledge sources. While some models do well on individual tasks, developing general models is difficult as each task might require computationally expensive indexing of custom knowledge sources, in addition to dedicated infrastructure. To catalyze research on models that condition on specific information in large textual resources, we present a benchmark for knowledge-intensive language tasks (KILT). All tasks in KILT are grounded in the same snapshot of Wikipedia, reducing engineering turnaround through the re-use of components, as well as accelerating research into task-agnostic memory architectures. We test both task-specific and general baselines, evaluating downstream performance in addition to the ability of the models to provide provenance. We find that a shared dense vector index coupled with a seq2seq model is a strong baseline, outperforming more tailor-made approaches for fact checking, open-domain question answering and dialogue, and yielding competitive results on entity linking and slot filling, by generating disambiguated text. KILT data and code are available at https://github.com/facebookresearch/KILT.

Results

TaskDatasetMetricValueModel
Question AnsweringKILT: ELI5F116.1T5-base
Question AnsweringKILT: ELI5Rouge-L19.08T5-base
Question AnsweringKILT: ELI5F117.88BART+DPR
Question AnsweringKILT: ELI5Rouge-L17.41BART+DPR
Question AnsweringKILT: ELI5F114.51RAG
Question AnsweringKILT: ELI5Rouge-L14.05RAG
Question AnsweringKILT: TriviaQAEM18.11T5-base
Question AnsweringKILT: TriviaQAF127.83T5-base
Question AnsweringKILT: Natural QuestionsEM19.6T5-base
Question AnsweringKILT: Natural QuestionsF127.73T5-base
Question AnsweringKILT: HotpotQAEM12.64T5-base
Question AnsweringKILT: HotpotQAF119.57T5-base
Question AnsweringKILT: ELI5F114.51RAG
Question AnsweringKILT: ELI5ROUGE-L14.05RAG
Question AnsweringKILT: ELI5F116.1T5-base
Question AnsweringKILT: ELI5ROUGE-L19.08T5-base
Entity LinkingKILT: WNED-WIKIAccuracy47.13T5-base
Entity LinkingKILT: WNED-WIKIKILT-AC47.13T5-base
Entity LinkingKILT: WNED-WIKIR-Prec47.13T5-base
Entity LinkingKILT: WNED-WIKIRecall@547.13T5-base
Entity LinkingKILT: AIDA-YAGO2Accuracy74.05T5-base
Entity LinkingKILT: AIDA-YAGO2KILT-AC74.05T5-base
Entity LinkingKILT: AIDA-YAGO2R-Prec74.05T5-base
Entity LinkingKILT: AIDA-YAGO2Recall@574.05T5-base
Entity LinkingKILT: WNED-CWEBAccuracy49.29T5-base
Entity LinkingKILT: WNED-CWEBKILT-AC49.29T5-base
Entity LinkingKILT: WNED-CWEBR-Prec49.29T5-base
Entity LinkingKILT: WNED-CWEBRecall@549.29T5-base
Slot FillingKILT: T-RExAccuracy43.56T5-base
Slot FillingKILT: T-RExF150.61T5-base
Slot FillingKILT: Zero Shot REAccuracy9.02T5-base
Slot FillingKILT: Zero Shot REF113.52T5-base
Fact VerificationKILT: FEVERAccuracy86.31RAG
Fact VerificationKILT: FEVERKILT-AC53.45RAG
Fact VerificationKILT: FEVERR-Prec61.94RAG
Fact VerificationKILT: FEVERRecall@575.55RAG
Fact VerificationKILT: FEVERAccuracy76.3T5-base
Open-Domain Question AnsweringKILT: TriviaQAEM18.11T5-base
Open-Domain Question AnsweringKILT: TriviaQAF127.83T5-base
Open-Domain Question AnsweringKILT: Natural QuestionsEM19.6T5-base
Open-Domain Question AnsweringKILT: Natural QuestionsF127.73T5-base
Open-Domain Question AnsweringKILT: HotpotQAEM12.64T5-base
Open-Domain Question AnsweringKILT: HotpotQAF119.57T5-base
Open-Domain Question AnsweringKILT: ELI5F114.51RAG
Open-Domain Question AnsweringKILT: ELI5ROUGE-L14.05RAG
Open-Domain Question AnsweringKILT: ELI5F116.1T5-base
Open-Domain Question AnsweringKILT: ELI5ROUGE-L19.08T5-base
Open-Domain DialogKILT: Wizard of WikipediaF113.53T5-base
Open-Domain DialogKILT: Wizard of WikipediaROUGE-L12.4T5-base

Related Papers

PiMRef: Detecting and Explaining Ever-evolving Spear Phishing Emails with Knowledge Base Invariants2025-07-21From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17Enter the Mind Palace: Reasoning and Planning for Long-term Active Embodied Question Answering2025-07-17Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It2025-07-17City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17Describe Anything Model for Visual Question Answering on Text-rich Images2025-07-16Is This Just Fantasy? Language Model Representations Reflect Human Judgments of Event Plausibility2025-07-16Warehouse Spatial Question Answering with LLM Agent2025-07-14