TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Cross-Lingual Named Entity Recognition Using Parallel Corp...

Cross-Lingual Named Entity Recognition Using Parallel Corpus: A New Approach Using XLM-RoBERTa Alignment

Bing Li, Yujie He, Wenjin Xu

2021-01-26named-entity-recognitionNamed Entity RecognitionTranslationNERCross-Lingual NEREntity AlignmentNamed Entity Recognition (NER)
PaperPDF

Abstract

We propose a novel approach for cross-lingual Named Entity Recognition (NER) zero-shot transfer using parallel corpora. We built an entity alignment model on top of XLM-RoBERTa to project the entities detected on the English part of the parallel data to the target language sentences, whose accuracy surpasses all previous unsupervised models. With the alignment model we can get pseudo-labeled NER data set in the target language to train task-specific model. Unlike using translation methods, this approach benefits from natural fluency and nuances in target-language original corpus. We also propose a modified loss function similar to focal loss but assigns weights in the opposite direction to further improve the model training on noisy pseudo-labeled data set. We evaluated this proposed approach over 4 target languages on benchmark data sets and got competitive F1 scores compared to most recent SOTA models. We also gave extra discussions about the impact of parallel corpus size and domain on the final transfer performance.

Results

TaskDatasetMetricValueModel
Cross-LingualCoNLL 2003Dutch79.7XLM-RoBERTa-large
Cross-LingualCoNLL 2003German76.9XLM-RoBERTa-large
Cross-LingualCoNLL 2003Spanish78.9XLM-RoBERTa-large
Cross-Lingual TransferCoNLL 2003Dutch79.7XLM-RoBERTa-large
Cross-Lingual TransferCoNLL 2003German76.9XLM-RoBERTa-large
Cross-Lingual TransferCoNLL 2003Spanish78.9XLM-RoBERTa-large

Related Papers

A Translation of Probabilistic Event Calculus into Markov Decision Processes2025-07-17Function-to-Style Guidance of LLMs for Code Translation2025-07-15Speak2Sign3D: A Multi-modal Pipeline for English Speech to American Sign Language Animation2025-07-09Pun Intended: Multi-Agent Translation of Wordplay with Contrastive Learning and Phonetic-Semantic Embeddings2025-07-09Flippi: End To End GenAI Assistant for E-Commerce2025-07-08Unconditional Diffusion for Generative Sequential Recommendation2025-07-08GRAFT: A Graph-based Flow-aware Agentic Framework for Document-level Machine Translation2025-07-04TransLaw: Benchmarking Large Language Models in Multi-Agent Simulation of the Collaborative Translation2025-07-01