TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Model and Data Transfer for Cross-Lingual Sequence Labelli...

Model and Data Transfer for Cross-Lingual Sequence Labelling in Zero-Resource Settings

Iker García-Ferrero, Rodrigo Agerri, German Rigau

2022-10-23Machine TranslationCross-Lingual TransferTranslationCross-Lingual NER
PaperPDFCode(official)Code(official)Code(official)Code

Abstract

Zero-resource cross-lingual transfer approaches aim to apply supervised models from a source language to unlabelled target languages. In this paper we perform an in-depth study of the two main techniques employed so far for cross-lingual zero-resource sequence labelling, based either on data or model transfer. Although previous research has proposed translation and annotation projection (data-based cross-lingual transfer) as an effective technique for cross-lingual sequence labelling, in this paper we experimentally demonstrate that high capacity multilingual language models applied in a zero-shot (model-based cross-lingual transfer) setting consistently outperform data-based cross-lingual transfer approaches. A detailed analysis of our results suggests that this might be due to important differences in language use. More specifically, machine translation often generates a textual signal which is different to what the models are exposed to when using gold standard data, which affects both the fine-tuning and evaluation processes. Our results also indicate that data-based cross-lingual transfer approaches remain a competitive option when high-capacity multilingual language models are not available.

Results

TaskDatasetMetricValueModel
Cross-LingualCoNLL DutchF179.7XLM-R large
Cross-LingualCoNLL GermanF174.5XLM-R large
Cross-LingualCoNLL SpanishF179.5XLM-R large
Cross-LingualCoNLL 2003Dutch82.3XLM-RoBERTa-large
Cross-LingualCoNLL 2003German74.5XLM-RoBERTa-large
Cross-LingualCoNLL 2003Spanish79.5XLM-RoBERTa-large
Cross-Lingual TransferCoNLL DutchF179.7XLM-R large
Cross-Lingual TransferCoNLL GermanF174.5XLM-R large
Cross-Lingual TransferCoNLL SpanishF179.5XLM-R large
Cross-Lingual TransferCoNLL 2003Dutch82.3XLM-RoBERTa-large
Cross-Lingual TransferCoNLL 2003German74.5XLM-RoBERTa-large
Cross-Lingual TransferCoNLL 2003Spanish79.5XLM-RoBERTa-large

Related Papers

Enhancing Cross-task Transfer of Large Language Models via Activation Steering2025-07-17A Translation of Probabilistic Event Calculus into Markov Decision Processes2025-07-17HanjaBridge: Resolving Semantic Ambiguity in Korean LLMs via Hanja-Augmented Pre-Training2025-07-15Function-to-Style Guidance of LLMs for Code Translation2025-07-15Speak2Sign3D: A Multi-modal Pipeline for English Speech to American Sign Language Animation2025-07-09Pun Intended: Multi-Agent Translation of Wordplay with Contrastive Learning and Phonetic-Semantic Embeddings2025-07-09Unconditional Diffusion for Generative Sequential Recommendation2025-07-08GRAFT: A Graph-based Flow-aware Agentic Framework for Document-level Machine Translation2025-07-04