CMU Wilderness Multilingual Speech Dataset
Speech
The CMU Wilderness Multilingual Speech Dataset is a dataset of over 700 different languages providing audio, aligned text and word pronunciations. On average each language provides around 20 hours of sentence-lengthed transcriptions.
Source: Alan W Black "CMU Wilderness Multilingual Speech Dataset" ICASSP 2019, Brighton, UK.