MediaSpeech: Multilanguage ASR Benchmark and Dataset

Rostislav Kolobov, Olga Okhapkina, Olga Omelchishina, Andrey Platunov, Roman Bedyakin, Vyacheslav Moshkin, Dmitry Menshikov, Nikolay Mikhaylovskiy

2021-03-30Speech Recognition speech-recognition

Paper PDF Code(official)

Abstract

The performance of automated speech recognition (ASR) systems is well known to differ for varied application domains. At the same time, vendors and research groups typically report ASR quality results either for limited use simplistic domains (audiobooks, TED talks), or proprietary datasets. To fill this gap, we provide an open-source 10-hour ASR system evaluation dataset NTR MediaSpeech for 4 languages: Spanish, French, Turkish and Arabic. The dataset was collected from the official youtube channels of media in the respective languages, and manually transcribed. We estimate that the WER of the dataset is under 5%. We have benchmarked many ASR systems available both commercially and freely, and provide the benchmark results. We also open-source baseline QuartzNet models for each language.

Results

Task	Dataset	Metric	Value	Model
Speech Recognition	MediaSpeech	WER for Arabic	0.13	Quartznet
Speech Recognition	MediaSpeech	WER for French	0.1915	Quartznet
Speech Recognition	MediaSpeech	WER for Spanish	0.1826	Quartznet
Speech Recognition	MediaSpeech	WER for Turkish	0.1422	Quartznet
Speech Recognition	MediaSpeech	WER for Arabic	0.2333	Wit
Speech Recognition	MediaSpeech	WER for French	0.1759	Wit
Speech Recognition	MediaSpeech	WER for Spanish	0.0879	Wit
Speech Recognition	MediaSpeech	WER for Turkish	0.0768	Wit
Speech Recognition	MediaSpeech	WER for Arabic	0.3016	Azure
Speech Recognition	MediaSpeech	WER for French	0.1683	Azure
Speech Recognition	MediaSpeech	WER for Spanish	0.1296	Azure
Speech Recognition	MediaSpeech	WER for Turkish	0.2296	Azure
Speech Recognition	MediaSpeech	WER for Arabic	0.3085	VOSK
Speech Recognition	MediaSpeech	WER for French	0.2111	VOSK
Speech Recognition	MediaSpeech	WER for Spanish	0.197	VOSK
Speech Recognition	MediaSpeech	WER for Turkish	0.305	VOSK
Speech Recognition	MediaSpeech	WER for Arabic	0.4464	Google
Speech Recognition	MediaSpeech	WER for French	0.2385	Google
Speech Recognition	MediaSpeech	WER for Spanish	0.2176	Google
Speech Recognition	MediaSpeech	WER for Turkish	0.2707	Google
Speech Recognition	MediaSpeech	WER for Arabic	0.9596	wav2vec
Speech Recognition	MediaSpeech	WER for French	0.3113	wav2vec
Speech Recognition	MediaSpeech	WER for Spanish	0.2469	wav2vec
Speech Recognition	MediaSpeech	WER for Turkish	0.5812	wav2vec
Speech Recognition	MediaSpeech	WER for French	0.4741	Deepspeech
Speech Recognition	MediaSpeech	WER for Spanish	0.4236	Deepspeech
Speech Recognition	MediaSpeech	WER for Spanish	0.307	Silero

MediaSpeech: Multilanguage ASR Benchmark and Dataset

Abstract

Results

Related Papers

MediaSpeech: Multilanguage ASR Benchmark and Dataset

Abstract

Results

Related Papers