TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Towards Building Text-To-Speech Systems for the Next Billi...

Towards Building Text-To-Speech Systems for the Next Billion Users

Gokul Karthik Kumar, Praveen S V, Pratyush Kumar, Mitesh M. Khapra, Karthik Nandakumar

2022-11-17Speech Synthesis - BengaliSpeech Synthesis - ManipuriSpeech Synthesis - TeluguSpeech Synthesis - BodoSpeech Synthesis - KannadaText to SpeechSpeech Synthesis - MalayalamSpeech SynthesisSpeech Synthesis - HindiText-To-Speech SynthesisSpeech Synthesis - AssameseSpeech Synthesis - TamilSpeech Synthesis - MarathiSpeech Synthesis - RajasthaniSpeech Synthesis - Gujaratitext-to-speech
PaperPDFCodeCode(official)

Abstract

Deep learning based text-to-speech (TTS) systems have been evolving rapidly with advances in model architectures, training methodologies, and generalization across speakers and languages. However, these advances have not been thoroughly investigated for Indian language speech synthesis. Such investigation is computationally expensive given the number and diversity of Indian languages, relatively lower resource availability, and the diverse set of advances in neural TTS that remain untested. In this paper, we evaluate the choice of acoustic models, vocoders, supplementary loss functions, training schedules, and speaker and language diversity for Dravidian and Indo-Aryan languages. Based on this, we identify monolingual models with FastPitch and HiFi-GAN V1, trained jointly on male and female speakers to perform the best. With this setup, we train and evaluate TTS models for 13 languages and find our models to significantly improve upon existing models in all languages as measured by mean opinion scores. We open-source all models on the Bhashini platform.

Results

TaskDatasetMetricValueModel
Speech RecognitionIndicTTSMean Opinion Score3.58AI4BharatTTS - FastPitch with HiFiGAN
Speech RecognitionIndicTTSMean Opinion Score3.84AI4BharatTTS - FastPitch with HiFiGAN
Speech RecognitionIndicTTSMean Opinion Score3.68AI4BharatTTS - FastPitch with HiFiGAN
Speech RecognitionIndicTTSMean Opinion Score3.64AI4BharatTTS - FastPitch with HiFiGAN
Speech RecognitionIndicTTSMean Opinion Score3.66AI4BharatTTS - FastPitch with HiFiGAN
Speech RecognitionIndicTTSMean Opinion Score2.39AI4BharatTTS - FastPitch with HiFiGAN
Speech RecognitionIndicTTSMean Opinion Score3.37AI4BharatTTS - FastPitch with HiFiGAN
Speech RecognitionIndicTTSMean Opinion Score3.53AI4BharatTTS - FastPitch with HiFiGAN
Speech RecognitionIndicTTSMean Opinion Score4AI4BharatTTS - FastPitch with HiFiGAN
Speech RecognitionIndicTTSMean Opinion Score3.3AI4BharatTTS - FastPitch with HiFiGAN
Speech RecognitionIndicTTSMean Opinion Score3.26AI4BharatTTS - FastPitch with HiFiGAN
Speech RecognitionIndicTTSMean Opinion Score3.4AI4BharatTTS - FastPitch with HiFiGAN
Speech SynthesisIndicTTSMean Opinion Score3.58AI4BharatTTS - FastPitch with HiFiGAN
Speech SynthesisIndicTTSMean Opinion Score3.84AI4BharatTTS - FastPitch with HiFiGAN
Speech SynthesisIndicTTSMean Opinion Score3.68AI4BharatTTS - FastPitch with HiFiGAN
Speech SynthesisIndicTTSMean Opinion Score3.64AI4BharatTTS - FastPitch with HiFiGAN
Speech SynthesisIndicTTSMean Opinion Score3.66AI4BharatTTS - FastPitch with HiFiGAN
Speech SynthesisIndicTTSMean Opinion Score2.39AI4BharatTTS - FastPitch with HiFiGAN
Speech SynthesisIndicTTSMean Opinion Score3.37AI4BharatTTS - FastPitch with HiFiGAN
Speech SynthesisIndicTTSMean Opinion Score3.53AI4BharatTTS - FastPitch with HiFiGAN
Speech SynthesisIndicTTSMean Opinion Score4AI4BharatTTS - FastPitch with HiFiGAN
Speech SynthesisIndicTTSMean Opinion Score3.3AI4BharatTTS - FastPitch with HiFiGAN
Speech SynthesisIndicTTSMean Opinion Score3.26AI4BharatTTS - FastPitch with HiFiGAN
Speech SynthesisIndicTTSMean Opinion Score3.4AI4BharatTTS - FastPitch with HiFiGAN
Accented Speech RecognitionIndicTTSMean Opinion Score3.58AI4BharatTTS - FastPitch with HiFiGAN
Accented Speech RecognitionIndicTTSMean Opinion Score3.84AI4BharatTTS - FastPitch with HiFiGAN
Accented Speech RecognitionIndicTTSMean Opinion Score3.68AI4BharatTTS - FastPitch with HiFiGAN
Accented Speech RecognitionIndicTTSMean Opinion Score3.64AI4BharatTTS - FastPitch with HiFiGAN
Accented Speech RecognitionIndicTTSMean Opinion Score3.66AI4BharatTTS - FastPitch with HiFiGAN
Accented Speech RecognitionIndicTTSMean Opinion Score2.39AI4BharatTTS - FastPitch with HiFiGAN
Accented Speech RecognitionIndicTTSMean Opinion Score3.37AI4BharatTTS - FastPitch with HiFiGAN
Accented Speech RecognitionIndicTTSMean Opinion Score3.53AI4BharatTTS - FastPitch with HiFiGAN
Accented Speech RecognitionIndicTTSMean Opinion Score4AI4BharatTTS - FastPitch with HiFiGAN
Accented Speech RecognitionIndicTTSMean Opinion Score3.3AI4BharatTTS - FastPitch with HiFiGAN
Accented Speech RecognitionIndicTTSMean Opinion Score3.26AI4BharatTTS - FastPitch with HiFiGAN
Accented Speech RecognitionIndicTTSMean Opinion Score3.4AI4BharatTTS - FastPitch with HiFiGAN

Related Papers

Hear Your Code Fail, Voice-Assisted Debugging for Python2025-07-20NonverbalTTS: A Public English Corpus of Text-Aligned Nonverbal Vocalizations with Emotion Annotations for Text-to-Speech2025-07-17P.808 Multilingual Speech Enhancement Testing: Approach and Results of URGENT 2025 Challenge2025-07-15An Empirical Evaluation of AI-Powered Non-Player Characters' Perceived Realism and Performance in Virtual Reality Environments2025-07-14ZipVoice-Dialog: Non-Autoregressive Spoken Dialogue Generation with Flow Matching2025-07-12Exploiting Leaderboards for Large-Scale Distribution of Malicious Models2025-07-11MIDI-VALLE: Improving Expressive Piano Performance Synthesis Through Neural Codec Language Modelling2025-07-11Speech Quality Assessment Model Based on Mixture of Experts: System-Level Performance Enhancement and Utterance-Level Challenge Analysis2025-07-08