SpeechMatrix

SpeechCC-BY-NC 4.0Introduced 2022-10-19

SpeechMatrix is a large-scale multilingual corpus of speech-to-speech translations mined from real speech of European Parliament recordings. It contains speech alignments in 136 language pairs with a total of 418 thousand hours of speech.

Source: SpeechMatrix: A Large-Scale Mined Corpus of Multilingual Speech-to-Speech Translations

Image Source: [(https://scontent-lhr8-2.xx.fbcdn.net/v/t39.8562-6/310002966_605149234737289_5204270723809834290_n.pdf?_nc_cat=102&ccb=1-7&_nc_sid=ad8a9d&_nc_ohc=FN2KnupyKI0AX90B5UO&_nc_ht=scontent-lhr8-2.xx&oh=00_AT9iFWHchGOnkzVTmwiYIDElIXSnwilSGhDwRQdFh99rlA&oe=63560915]((https://scontent-lhr8-2.xx.fbcdn.net/v/t39.8562-6/310002966_605149234737289_5204270723809834290_n.pdf?_nc_cat=102&ccb=1-7&_nc_sid=ad8a9d&_nc_ohc=FN2KnupyKI0AX90B5UO&_nc_ht=scontent-lhr8-2.xx&oh=00_AT9iFWHchGOnkzVTmwiYIDElIXSnwilSGhDwRQdFh99rlA&oe=63560915)