KPI-EDGAR: A Novel Dataset and Accompanying Metric for Relation Extraction from Financial Documents

Tobias Deußer, Syed Musharraf Ali, Lars Hillebrand, Desiana Nurchalifah, Basil Jacob, Christian Bauckhage, Rafet Sifa

2022-10-17Relation Extraction Benchmarking named-entity-recognition Named Entity Recognition Joint Entity and Relation Extraction Retrieval Named Entity Recognition (NER)

Paper PDF Code(official)

Abstract

We introduce KPI-EDGAR, a novel dataset for Joint Named Entity Recognition and Relation Extraction building on financial reports uploaded to the Electronic Data Gathering, Analysis, and Retrieval (EDGAR) system, where the main objective is to extract Key Performance Indicators (KPIs) from financial documents and link them to their numerical values and other attributes. We further provide four accompanying baselines for benchmarking potential future research. Additionally, we propose a new way of measuring the success of said extraction process by incorporating a word-level weighting scheme into the conventional F1 score to better model the inherently fuzzy borders of the entity pairs of a relation in this domain.

Results

Task	Dataset	Metric	Value	Model
Relation Extraction	KPI-EDGAR	Relation F1	43.76	KPI-BERT
Information Extraction	KPI-EDGAR	Relation F1	43.76	KPI-BERT

Related Papers

Visual Place Recognition for Large-Scale UAV Applications2025-07-20 Training Transformers with Enforced Lipschitz Constants2025-07-17 Disentangling coincident cell events using deep transfer learning and compressive sensing2025-07-17 MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17 From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17 HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals2025-07-17 A Survey of Context Engineering for Large Language Models2025-07-17 MCoT-RE: Multi-Faceted Chain-of-Thought and Re-Ranking for Training-Free Zero-Shot Composed Image Retrieval2025-07-17