CMU Wilderness Multilingual Speech Dataset

Speech

The CMU Wilderness Multilingual Speech Dataset is a dataset of over 700 different languages providing audio, aligned text and word pronunciations. On average each language provides around 20 hours of sentence-lengthed transcriptions.

Source: Alan W Black "CMU Wilderness Multilingual Speech Dataset" ICASSP 2019, Brighton, UK.