TTC

Tatoeba Translation Challenge

Introduced 2020-10-13

This is a challenge set for machine translation that contains 32G translation units in 2,539 bitexts. The whole data set covers 487 languages linked to each other in 4,024 language pairs. The package includes a release of 657 test sets derived from Tatoeba.org that cover 138 languages. Training data is compiled from various sources collected within the OPUS project.