TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/mT5: A massively multilingual pre-trained text-to-text tra...

mT5: A massively multilingual pre-trained text-to-text transformer

Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel

2020-10-22NAACL 2021 4Reading ComprehensionQuestion AnsweringNatural Language InferenceCommon Sense ReasoningTranslation
PaperPDFCodeCodeCodeCodeCode(official)CodeCodeCode

Abstract

The recent "Text-to-Text Transfer Transformer" (T5) leveraged a unified text-to-text format and scale to attain state-of-the-art results on a wide variety of English-language NLP tasks. In this paper, we introduce mT5, a multilingual variant of T5 that was pre-trained on a new Common Crawl-based dataset covering 101 languages. We detail the design and modified training of mT5 and demonstrate its state-of-the-art performance on many multilingual benchmarks. We also describe a simple technique to prevent "accidental translation" in the zero-shot setting, where a generative model chooses to (partially) translate its prediction into the wrong language. All of the code and model checkpoints used in this work are publicly available.

Results

TaskDatasetMetricValueModel
Reading ComprehensionMuSeRCAverage F10.844MT5 Large
Reading ComprehensionMuSeRCEM 0.543MT5 Large
Question AnsweringDaNetQAAccuracy0.657MT5 Large
Common Sense ReasoningRWSDAccuracy0.669MT5 Large
Common Sense ReasoningPARusAccuracy0.504MT5 Large
Common Sense ReasoningRuCoSAverage F10.57MT5 Large
Common Sense ReasoningRuCoSEM 0.562MT5 Large
Natural Language InferenceRCBAccuracy0.454MT5 Large
Natural Language InferenceRCBAverage F10.366MT5 Large
Natural Language InferenceLiDiRusMCC0.061MT5 Large
Natural Language InferenceTERRaAccuracy0.561MT5 Large
Cross-LingualXTREMEAvg40.9mT5
Cross-LingualXTREMEQuestion Answering73.6mT5
Cross-LingualXTREMESentence-pair Classification89.8mT5
Cross-Lingual TransferXTREMEAvg40.9mT5
Cross-Lingual TransferXTREMEQuestion Answering73.6mT5
Cross-Lingual TransferXTREMESentence-pair Classification89.8mT5

Related Papers

From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17Enter the Mind Palace: Reasoning and Planning for Long-term Active Embodied Question Answering2025-07-17Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It2025-07-17City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17Comparing Apples to Oranges: A Dataset & Analysis of LLM Humour Understanding from Traditional Puns to Topical Jokes2025-07-17A Translation of Probabilistic Event Calculus into Markov Decision Processes2025-07-17Describe Anything Model for Visual Question Answering on Text-rich Images2025-07-16Is This Just Fantasy? Language Model Representations Reflect Human Judgments of Event Plausibility2025-07-16