TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets/MLDoc

MLDoc

Multilingual Document Classification Corpus

TextsCustomIntroduced 2018-01-01

Multilingual Document Classification Corpus (MLDoc) is a cross-lingual document classification dataset covering English, German, French, Spanish, Italian, Russian, Japanese and Chinese. It is a subset of the Reuters Corpus Volume 2 selected according to the following design choices:

  • uniform class coverage: same number of examples for each class and language,
  • official train / development / test split: for each language a training data of different sizes (1K, 2K, 5K and 10K stories), a development (1K) and a test corpus (4K) are provided (with exception of Spanish and Russian with 9458 and 5216 training documents respectively.

Source: A Corpus for Multilingual Document Classification in Eight Languages

Related Benchmarks

MLDoc Zero-Shot English-to-Chinese/Cross-Lingual/AccuracyMLDoc Zero-Shot English-to-Chinese/Cross-Lingual Document Classification/AccuracyMLDoc Zero-Shot English-to-French/Cross-Lingual/AccuracyMLDoc Zero-Shot English-to-French/Cross-Lingual Document Classification/AccuracyMLDoc Zero-Shot English-to-German/Cross-Lingual/AccuracyMLDoc Zero-Shot English-to-German/Cross-Lingual Document Classification/AccuracyMLDoc Zero-Shot English-to-Italian/Cross-Lingual/AccuracyMLDoc Zero-Shot English-to-Italian/Cross-Lingual Document Classification/AccuracyMLDoc Zero-Shot English-to-Japanese/Cross-Lingual/AccuracyMLDoc Zero-Shot English-to-Japanese/Cross-Lingual Document Classification/AccuracyMLDoc Zero-Shot English-to-Russian/Cross-Lingual/AccuracyMLDoc Zero-Shot English-to-Russian/Cross-Lingual Document Classification/AccuracyMLDoc Zero-Shot English-to-Spanish/Cross-Lingual/AccuracyMLDoc Zero-Shot English-to-Spanish/Cross-Lingual Document Classification/AccuracyMLDoc Zero-Shot German-to-French/Cross-Lingual/AccuracyMLDoc Zero-Shot German-to-French/Cross-Lingual Document Classification/Accuracy

Statistics

Papers
53
Benchmarks
0

Links

Homepage

Tasks

Cross-Lingual Document ClassificationCross-Lingual Sentiment Classification