PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation

Sang-Hoon Lee, Ha-Yeong Choi, Seong-Whan Lee

2024-08-14Text to Speech Speech Synthesis text-to-speech

Abstract

Recently, universal waveform generation tasks have been investigated conditioned on various out-of-distribution scenarios. Although GAN-based methods have shown their strength in fast waveform generation, they are vulnerable to train-inference mismatch scenarios such as two-stage text-to-speech. Meanwhile, diffusion-based models have shown their powerful generative performance in other domains; however, they stay out of the limelight due to slow inference speed in waveform generation tasks. Above all, there is no generator architecture that can explicitly disentangle the natural periodic features of high-resolution waveform signals. In this paper, we propose PeriodWave, a novel universal waveform generation model. First, we introduce a period-aware flow matching estimator that can capture the periodic features of the waveform signal when estimating the vector fields. Additionally, we utilize a multi-period estimator that avoids overlaps to capture different periodic features of waveform signals. Although increasing the number of periods can improve the performance significantly, this requires more computational costs. To reduce this issue, we also propose a single period-conditional universal estimator that can feed-forward parallel by period-wise batch inference. Additionally, we utilize discrete wavelet transform to losslessly disentangle the frequency information of waveform signals for high-frequency modeling, and introduce FreeU to reduce the high-frequency noise for waveform generation. The experimental results demonstrated that our model outperforms the previous models both in Mel-spectrogram reconstruction and text-to-speech tasks. All source code will be available at \url{https://github.com/sh-lee-prml/PeriodWave}.

Results

Task	Dataset	Metric	Value	Model
Speech Recognition	LibriTTS	M-STFT	1.0269	PeriodWave + FreeU
Speech Recognition	LibriTTS	PESQ	4.248	PeriodWave + FreeU
Speech Recognition	LibriTTS	Periodicity	0.0765	PeriodWave + FreeU
Speech Recognition	LibriTTS	V/UV F1	0.9651	PeriodWave + FreeU
Speech Synthesis	LibriTTS	M-STFT	1.0269	PeriodWave + FreeU
Speech Synthesis	LibriTTS	PESQ	4.248	PeriodWave + FreeU
Speech Synthesis	LibriTTS	Periodicity	0.0765	PeriodWave + FreeU
Speech Synthesis	LibriTTS	V/UV F1	0.9651	PeriodWave + FreeU
Accented Speech Recognition	LibriTTS	M-STFT	1.0269	PeriodWave + FreeU
Accented Speech Recognition	LibriTTS	PESQ	4.248	PeriodWave + FreeU
Accented Speech Recognition	LibriTTS	Periodicity	0.0765	PeriodWave + FreeU
Accented Speech Recognition	LibriTTS	V/UV F1	0.9651	PeriodWave + FreeU

PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation

Abstract

Results

Related Papers

PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation

Abstract

Results

Related Papers