WikiMatrix
TextsCC BY-SA 4.0
WikiMatrix is a dataset of parallel sentences in the textual content of Wikipedia for all possible language pairs. The mined data consists of:
- 85 different languages, 1620 language pairs
- 134M parallel sentences, out of which 34M are aligned with English
Source: WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia