ClariNet

SequentialIntroduced 20007 papers

Description

ClariNet is an end-to-end text-to-speech architecture. Unlike previous TTS systems which use text-to-spectogram models with a separate waveform synthesizer (vocoder), ClariNet is a text-to-wave architecture that is fully convolutional and can be trained from scratch. In ClariNet, the WaveNet module is conditioned on the hidden states instead of the mel-spectogram. The architecture is otherwise based on Deep Voice 3.

Papers Using This Method

Clarinet: A Music Retrieval System2022-10-23 Learning from a Complementary-label Source Domain: Theory and Algorithms2020-08-04 Clarinet: A One-step Approach Towards Budget-friendly Unsupervised Domain Adaptation2020-07-29 Multi-Speaker End-to-End Speech Synthesis2019-07-09 Non-Autoregressive Neural Text-to-Speech2019-05-21 Neural source-filter waveform models for statistical parametric speech synthesis2019-04-27 ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech2018-07-19