TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/PeriodWave: Multi-Period Flow Matching for High-Fidelity W...

PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation

Sang-Hoon Lee, Ha-Yeong Choi, Seong-Whan Lee

2024-08-14Text to SpeechSpeech Synthesistext-to-speech
PaperPDFCode(official)

Abstract

Recently, universal waveform generation tasks have been investigated conditioned on various out-of-distribution scenarios. Although GAN-based methods have shown their strength in fast waveform generation, they are vulnerable to train-inference mismatch scenarios such as two-stage text-to-speech. Meanwhile, diffusion-based models have shown their powerful generative performance in other domains; however, they stay out of the limelight due to slow inference speed in waveform generation tasks. Above all, there is no generator architecture that can explicitly disentangle the natural periodic features of high-resolution waveform signals. In this paper, we propose PeriodWave, a novel universal waveform generation model. First, we introduce a period-aware flow matching estimator that can capture the periodic features of the waveform signal when estimating the vector fields. Additionally, we utilize a multi-period estimator that avoids overlaps to capture different periodic features of waveform signals. Although increasing the number of periods can improve the performance significantly, this requires more computational costs. To reduce this issue, we also propose a single period-conditional universal estimator that can feed-forward parallel by period-wise batch inference. Additionally, we utilize discrete wavelet transform to losslessly disentangle the frequency information of waveform signals for high-frequency modeling, and introduce FreeU to reduce the high-frequency noise for waveform generation. The experimental results demonstrated that our model outperforms the previous models both in Mel-spectrogram reconstruction and text-to-speech tasks. All source code will be available at \url{https://github.com/sh-lee-prml/PeriodWave}.

Results

TaskDatasetMetricValueModel
Speech RecognitionLibriTTSM-STFT1.0269PeriodWave + FreeU
Speech RecognitionLibriTTSPESQ4.248PeriodWave + FreeU
Speech RecognitionLibriTTSPeriodicity0.0765PeriodWave + FreeU
Speech RecognitionLibriTTSV/UV F10.9651PeriodWave + FreeU
Speech SynthesisLibriTTSM-STFT1.0269PeriodWave + FreeU
Speech SynthesisLibriTTSPESQ4.248PeriodWave + FreeU
Speech SynthesisLibriTTSPeriodicity0.0765PeriodWave + FreeU
Speech SynthesisLibriTTSV/UV F10.9651PeriodWave + FreeU
Accented Speech RecognitionLibriTTSM-STFT1.0269PeriodWave + FreeU
Accented Speech RecognitionLibriTTSPESQ4.248PeriodWave + FreeU
Accented Speech RecognitionLibriTTSPeriodicity0.0765PeriodWave + FreeU
Accented Speech RecognitionLibriTTSV/UV F10.9651PeriodWave + FreeU

Related Papers

Hear Your Code Fail, Voice-Assisted Debugging for Python2025-07-20NonverbalTTS: A Public English Corpus of Text-Aligned Nonverbal Vocalizations with Emotion Annotations for Text-to-Speech2025-07-17P.808 Multilingual Speech Enhancement Testing: Approach and Results of URGENT 2025 Challenge2025-07-15An Empirical Evaluation of AI-Powered Non-Player Characters' Perceived Realism and Performance in Virtual Reality Environments2025-07-14ZipVoice-Dialog: Non-Autoregressive Spoken Dialogue Generation with Flow Matching2025-07-12Exploiting Leaderboards for Large-Scale Distribution of Malicious Models2025-07-11MIDI-VALLE: Improving Expressive Piano Performance Synthesis Through Neural Codec Language Modelling2025-07-11Speech Quality Assessment Model Based on Mixture of Experts: System-Level Performance Enhancement and Utterance-Level Challenge Analysis2025-07-08