Description
WaveGrad is a conditional model for waveform generation through estimating gradients of the data density. This model is built on the prior work on score matching and diffusion probabilistic models. It starts from Gaussian white noise and iteratively refines the signal via a gradient-based sampler conditioned on the mel-spectrogram. WaveGrad is non-autoregressive, and requires only a constant number of generation steps during inference. It can use as few as 6 iterations to generate high fidelity audio samples.
Papers Using This Method
GLA-Grad: A Griffin-Lim Extended Waveform Generation Diffusion Model2024-02-09BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis2022-03-25InferGrad: Improving Diffusion Models for Vocoder by Considering Inference in Training2022-02-08Quasi-Taylor Samplers for Diffusion Generative Models based on Ideal Derivatives2021-12-26VocBench: A Neural Vocoder Benchmark for Speech Synthesis2021-12-06WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis2021-06-17WaveGrad: Estimating Gradients for Waveform Generation2020-09-02