Tacotron

SequentialIntroduced 200065 papers

Description

Tacotron is an end-to-end generative text-to-speech model that takes a character sequence as input and outputs the corresponding spectrogram. The backbone of Tacotron is a seq2seq model with attention. The Figure depicts the model, which includes an encoder, an attention-based decoder, and a post-processing net. At a high-level, the model takes characters as input and produces spectrogram frames, which are then converted to waveforms.

Papers Using This Method

Robust and Unbounded Length Generalization in Autoregressive Transformer-Based Text-to-Speech2024-10-29 Enhancing Kurdish Text-to-Speech with Native Corpus Training: A High-Quality WaveGlow Vocoder Approach2024-09-10 Training Universal Vocoders with Feature Smoothing-Based Augmentation Methods for High-Quality TTS Systems2024-09-04 Leveraging the Interplay Between Syntactic and Acoustic Cues for Optimizing Korean TTS Pause Formation2024-04-03 An overview of text-to-speech systems and media applications2023-10-22 Energy-Based Models For Speech Synthesis2023-10-19 The DeepZen Speech Synthesis System for Blizzard Challenge 20232023-08-30 Multilingual Text-to-Speech Synthesis for Turkic Languages Using Transliteration2023-05-25 A Virtual Simulation-Pilot Agent for Training of Air Traffic Controllers2023-04-16 ArmanTTS single-speaker Persian dataset2023-04-07 Investigation of Japanese PnG BERT language model in text-to-speech synthesis for pitch accent language2022-12-16 Investigating Content-Aware Neural Text-To-Speech MOS Prediction Using Prosodic and Linguistic Features2022-11-01 Cross-lingual Text-To-Speech with Flow-based Voice Conversion for Improved Pronunciation2022-10-31 Towards Developing State-of-the-Art TTS Synthesisers for 13 Indian Languages with Signal Processing aided Alignments2022-10-31 Efficiently Trained Low-Resource Mongolian Text-to-Speech System Based On FullConv-TTS2022-10-24 Facial Landmark Predictions with Applications to Metaverse2022-09-29 Self-supervised learning for robust voice cloning2022-04-07 Singing-Tacotron: Global duration control attention and dynamic filter for End-to-end singing voice synthesis2022-02-16 Zero-Shot Long-Form Voice Cloning with Dynamic Convolution Attention2022-01-25 Word-Level Style Control for Expressive, Non-attentive Speech Synthesis2021-11-19