OpenNER 1.0: Standardized Open-Access Named Entity Recognition Datasets in 50+ Languages

Chester Palen-Michel, Maxwell Pickering, Maya Kruse, Jonne Sälevä, Constantine Lignos

2024-12-12named-entity-recognition Named Entity Recognition Pretrained Multilingual Language Models NER Named Entity Recognition (NER)

Paper PDF

Abstract

We present OpenNER 1.0, a standardized collection of openly available named entity recognition (NER) datasets. OpenNER contains 34 datasets spanning 51 languages, annotated in varying named entity ontologies. We correct annotation format issues, standardize the original datasets into a uniform representation, map entity type names to be more consistent across corpora, and provide the collection in a structure that enables research in multilingual and multi-ontology NER. We provide baseline models using three pretrained multilingual language models to compare the performance of recent models and facilitate future research in NER.

Related Papers

Flippi: End To End GenAI Assistant for E-Commerce2025-07-08 Selecting and Merging: Towards Adaptable and Scalable Named Entity Recognition with Large Language Models2025-06-28 Improving Named Entity Transcription with Contextual LLM-based Revision2025-06-12 Better Semi-supervised Learning for Multi-domain ASR Through Incremental Retraining and Data Filtering2025-06-05 Dissecting Bias in LLMs: A Mechanistic Interpretability Perspective2025-06-05 Efficient Data Selection for Domain Adaptation of ASR Using Pseudo-Labels and Multi-Stage Filtering2025-06-04 EL4NER: Ensemble Learning for Named Entity Recognition via Multiple Small-Parameter Large Language Models2025-05-29 Label-Guided In-Context Learning for Named Entity Recognition2025-05-29