MSNER

Multilingual Spoken Named Entity Recognition

SpeechTextsCC BY-NCIntroduced 2024-05-20

This dataset contains named entities annotations for European Parliament recordings in Dutch, French, German and Spanish. The entity annotation scheme follows OntoNotes v5. The original unannotated dataset is VoxPopuli.

The training and developement sets contain silver-quality annotations. The test set contains human-verified gold-quality annotations. The test set is available on HuggingFace in BIO format: qmeeus/MSNER

@inproceedings{MSNER,
author = {Meeus, Quentin and Moens, Marie-Francine and Van hamme, Hugo},
booktitle = {20th Joint ACL-ISO Workshop on Interoperable Semantic Annotation at LREC-COLING},
title = {{MSNER: A Multilingual Speech Dataset for Named Entity Recognition}},
year = {2024}
}