Description
Glow-TTS is a flow-based generative model for parallel TTS that does not require any external aligner. By combining the properties of flows and dynamic programming, the proposed model searches for the most probable monotonic alignment between text and the latent representation of speech. The model is directly trained to maximize the log-likelihood of speech with the alignment. Enforcing hard monotonic alignments helps enable robust TTS, which generalizes to long utterances, and employing flows enables fast, diverse, and controllable speech synthesis.
Papers Using This Method
Super Monotonic Alignment Search2024-09-12Stochastic Pitch Prediction Improves the Diversity and Naturalness of Speech in Glow-TTS2023-05-28ClArTTS: An Open-Source Classical Arabic Text-to-Speech Corpus2023-02-28GlowVC: Mel-spectrogram space disentangling model for language-independent text-free voice conversion2022-07-04PortaSpeech: Portable and High-Quality Generative Text-to-Speech2021-09-30Bidirectional Variational Inference for Non-Autoregressive Text-to-Speech2021-01-01Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search2020-05-22