WikiGUM: Exhaustive Entity Linking for Wikification in 12 Genres
Jessica Lin, Amir Zeldes
Abstract
Previous work on Entity Linking has focused on resources targeting non-nested proper named entity mentions, often in data from Wikipedia, i.e. Wikification. In this paper, we present and evaluate WikiGUM, a fully wikified dataset, covering all mentions of named entities, including their non-named and pronominal mentions, as well as mentions nested within other mentions. The dataset covers a broad range of 12 written and spoken genres, most of which have not been included in Entity Linking efforts to date, leading to poor performance by a pretrained SOTA system in our evaluation. The availability of a variety of other annotations for the same data also enables further research on entities in context.
Results
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Entity Linking | GUM | F1 | 26.4 | baseline |
Related Papers
LEMONADE: A Large Multilingual Expert-Annotated Abstractive Event Dataset for the Real World2025-06-01Distilling Closed-Source LLM's Knowledge for Locally Stable and Economic Biomedical Entity Linking2025-05-26Evaluation of LLMs on Long-tail Entity Linking in Historical Documents2025-05-06KGMEL: Knowledge Graph-Enhanced Multimodal Entity Linking2025-04-21Cross-Document Contextual Coreference Resolution in Knowledge Graphs2025-04-08Explainable ICD Coding via Entity Linking2025-03-26Entity-aware Cross-lingual Claim Detection for Automated Fact-checking2025-03-19Leveraging Knowledge Graphs and LLMs for Context-Aware Messaging2025-03-12