FLoRes

Facebook Low Resource MT Benchmark

CC BY-SA 4.0Introduced 2019-11-01

FLoRes is a benchmark dataset for machine translation between English and four low-resource languages, Nepali, Sinhala, Khmer, and Pashto, based on sentences translated from Wikipedia. The FLoRes project has two versions: ** FLoRes-101** and ** FLoRes-200**.

  • ** FLoRes-101**: This was the first version of the dataset. It allowed researchers to measure the quality of translations through 10,100 different translation directions.

  • ** FLoRes-200**: This is an updated version of the dataset. It doubles the existing language coverage of FLoRes-101. Given the nature of the new languages, which have less standardization and require more specialized professional translations, the verification process became more complex.