XNLI: Evaluating Cross-lingual Sentence Representations

Alexis Conneau, Guillaume Lample, Ruty Rinott, Adina Williams, Samuel R. Bowman, Holger Schwenk, Veselin Stoyanov

2018-09-13EMNLP 2018 10Machine Translation Natural Language Inference Translation Cross-Lingual Natural Language Inference

Paper PDF Code Code Code Code Code Code Code Code Code

Abstract

State-of-the-art natural language processing systems rely on supervision in the form of annotated data to learn competent models. These models are generally trained on data in a single language (usually English), and cannot be directly used beyond that language. Since collecting data in every language is not realistic, there has been a growing interest in cross-lingual language understanding (XLU) and low-resource cross-language transfer. In this work, we construct an evaluation set for XLU by extending the development and test sets of the Multi-Genre Natural Language Inference Corpus (MultiNLI) to 15 languages, including low-resource languages such as Swahili and Urdu. We hope that our dataset, dubbed XNLI, will catalyze research in cross-lingual sentence understanding by providing an informative standard evaluation task. In addition, we provide several baselines for multilingual sentence understanding, including two based on machine translation systems, and two that use parallel data to train aligned multilingual bag-of-words and LSTM encoders. We find that XNLI represents a practical and challenging evaluation suite, and that directly translating the test data yields the best performance among available baselines.

Results

Task	Dataset	Metric	Value	Model
Natural Language Inference	XNLI French	Accuracy	68.3	BiLSTM-max

Related Papers

A Translation of Probabilistic Event Calculus into Markov Decision Processes2025-07-17 LRCTI: A Large Language Model-Based Framework for Multi-Step Evidence Retrieval and Reasoning in Cyber Threat Intelligence Credibility Verification2025-07-15 Function-to-Style Guidance of LLMs for Code Translation2025-07-15 Speak2Sign3D: A Multi-modal Pipeline for English Speech to American Sign Language Animation2025-07-09 Pun Intended: Multi-Agent Translation of Wordplay with Contrastive Learning and Phonetic-Semantic Embeddings2025-07-09 DS@GT at CheckThat! 2025: Evaluating Context and Tokenization Strategies for Numerical Fact Verification2025-07-08 Unconditional Diffusion for Generative Sequential Recommendation2025-07-08 GRAFT: A Graph-based Flow-aware Agentic Framework for Document-level Machine Translation2025-07-04