TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/MediaSpeech: Multilanguage ASR Benchmark and Dataset

MediaSpeech: Multilanguage ASR Benchmark and Dataset

Rostislav Kolobov, Olga Okhapkina, Olga Omelchishina, Andrey Platunov, Roman Bedyakin, Vyacheslav Moshkin, Dmitry Menshikov, Nikolay Mikhaylovskiy

2021-03-30Speech Recognitionspeech-recognition
PaperPDFCode(official)

Abstract

The performance of automated speech recognition (ASR) systems is well known to differ for varied application domains. At the same time, vendors and research groups typically report ASR quality results either for limited use simplistic domains (audiobooks, TED talks), or proprietary datasets. To fill this gap, we provide an open-source 10-hour ASR system evaluation dataset NTR MediaSpeech for 4 languages: Spanish, French, Turkish and Arabic. The dataset was collected from the official youtube channels of media in the respective languages, and manually transcribed. We estimate that the WER of the dataset is under 5%. We have benchmarked many ASR systems available both commercially and freely, and provide the benchmark results. We also open-source baseline QuartzNet models for each language.

Results

TaskDatasetMetricValueModel
Speech RecognitionMediaSpeechWER for Arabic0.13Quartznet
Speech RecognitionMediaSpeechWER for French0.1915Quartznet
Speech RecognitionMediaSpeechWER for Spanish0.1826Quartznet
Speech RecognitionMediaSpeechWER for Turkish0.1422Quartznet
Speech RecognitionMediaSpeechWER for Arabic0.2333Wit
Speech RecognitionMediaSpeechWER for French0.1759Wit
Speech RecognitionMediaSpeechWER for Spanish0.0879Wit
Speech RecognitionMediaSpeechWER for Turkish0.0768Wit
Speech RecognitionMediaSpeechWER for Arabic0.3016Azure
Speech RecognitionMediaSpeechWER for French0.1683Azure
Speech RecognitionMediaSpeechWER for Spanish0.1296Azure
Speech RecognitionMediaSpeechWER for Turkish0.2296Azure
Speech RecognitionMediaSpeechWER for Arabic0.3085VOSK
Speech RecognitionMediaSpeechWER for French0.2111VOSK
Speech RecognitionMediaSpeechWER for Spanish0.197VOSK
Speech RecognitionMediaSpeechWER for Turkish0.305VOSK
Speech RecognitionMediaSpeechWER for Arabic0.4464Google
Speech RecognitionMediaSpeechWER for French0.2385Google
Speech RecognitionMediaSpeechWER for Spanish0.2176Google
Speech RecognitionMediaSpeechWER for Turkish0.2707Google
Speech RecognitionMediaSpeechWER for Arabic0.9596wav2vec
Speech RecognitionMediaSpeechWER for French0.3113wav2vec
Speech RecognitionMediaSpeechWER for Spanish0.2469wav2vec
Speech RecognitionMediaSpeechWER for Turkish0.5812wav2vec
Speech RecognitionMediaSpeechWER for French0.4741Deepspeech
Speech RecognitionMediaSpeechWER for Spanish0.4236Deepspeech
Speech RecognitionMediaSpeechWER for Spanish0.307Silero

Related Papers

Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine2025-07-17NonverbalTTS: A Public English Corpus of Text-Aligned Nonverbal Vocalizations with Emotion Annotations for Text-to-Speech2025-07-17WhisperKit: On-device Real-time ASR with Billion-Scale Transformers2025-07-14VisualSpeaker: Visually-Guided 3D Avatar Lip Synthesis2025-07-08A Hybrid Machine Learning Framework for Optimizing Crop Selection via Agronomic and Economic Forecasting2025-07-06First Steps Towards Voice Anonymization for Code-Switching Speech2025-07-02MambAttention: Mamba with Multi-Head Attention for Generalizable Single-Channel Speech Enhancement2025-07-01AUTOMATIC PRONUNCIATION MISTAKE DETECTOR PROJECT REPORT2025-06-25