WikiMatrix

TextsCC BY-SA 4.0

WikiMatrix is a dataset of parallel sentences in the textual content of Wikipedia for all possible language pairs. The mined data consists of:

85 different languages, 1620 language pairs
134M parallel sentences, out of which 34M are aligned with English

Source: WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia