TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Accelerating High-Fidelity Waveform Generation via Adversa...

Accelerating High-Fidelity Waveform Generation via Adversarial Flow Matching Optimization

Sang-Hoon Lee, Ha-Yeong Choi, Seong-Whan Lee

2024-08-15Speech Synthesis
PaperPDFCode(official)

Abstract

This paper introduces PeriodWave-Turbo, a high-fidelity and high-efficient waveform generation model via adversarial flow matching optimization. Recently, conditional flow matching (CFM) generative models have been successfully adopted for waveform generation tasks, leveraging a single vector field estimation objective for training. Although these models can generate high-fidelity waveform signals, they require significantly more ODE steps compared to GAN-based models, which only need a single generation step. Additionally, the generated samples often lack high-frequency information due to noisy vector field estimation, which fails to ensure high-frequency reproduction. To address this limitation, we enhance pre-trained CFM-based generative models by incorporating a fixed-step generator modification. We utilized reconstruction losses and adversarial feedback to accelerate high-fidelity waveform generation. Through adversarial flow matching optimization, it only requires 1,000 steps of fine-tuning to achieve state-of-the-art performance across various objective metrics. Moreover, we significantly reduce inference speed from 16 steps to 2 or 4 steps. Additionally, by scaling up the backbone of PeriodWave from 29M to 70M parameters for improved generalization, PeriodWave-Turbo achieves unprecedented performance, with a perceptual evaluation of speech quality (PESQ) score of 4.454 on the LibriTTS dataset. Audio samples, source code and checkpoints will be available at https://github.com/sh-lee-prml/PeriodWave.

Results

TaskDatasetMetricValueModel
Speech RecognitionLibriTTSM-STFT0.7358PeriodWave-Turbo-L
Speech RecognitionLibriTTSPESQ4.454PeriodWave-Turbo-L
Speech RecognitionLibriTTSPeriodicity0.0528PeriodWave-Turbo-L
Speech RecognitionLibriTTSV/UV F10.9756PeriodWave-Turbo-L
Speech SynthesisLibriTTSM-STFT0.7358PeriodWave-Turbo-L
Speech SynthesisLibriTTSPESQ4.454PeriodWave-Turbo-L
Speech SynthesisLibriTTSPeriodicity0.0528PeriodWave-Turbo-L
Speech SynthesisLibriTTSV/UV F10.9756PeriodWave-Turbo-L
Accented Speech RecognitionLibriTTSM-STFT0.7358PeriodWave-Turbo-L
Accented Speech RecognitionLibriTTSPESQ4.454PeriodWave-Turbo-L
Accented Speech RecognitionLibriTTSPeriodicity0.0528PeriodWave-Turbo-L
Accented Speech RecognitionLibriTTSV/UV F10.9756PeriodWave-Turbo-L

Related Papers

NonverbalTTS: A Public English Corpus of Text-Aligned Nonverbal Vocalizations with Emotion Annotations for Text-to-Speech2025-07-17Speech Quality Assessment Model Based on Mixture of Experts: System-Level Performance Enhancement and Utterance-Level Challenge Analysis2025-07-08A Hybrid Machine Learning Framework for Optimizing Crop Selection via Agronomic and Economic Forecasting2025-07-06DeepGesture: A conversational gesture synthesis system based on emotions and semantics2025-07-03OpusLM: A Family of Open Unified Speech Language Models2025-06-21RapFlow-TTS: Rapid and High-Fidelity Text-to-Speech with Improved Consistency Flow Matching2025-06-20InstructTTSEval: Benchmarking Complex Natural-Language Instruction Following in Text-to-Speech Systems2025-06-19An accurate and revised version of optical character recognition-based speech synthesis using LabVIEW2025-06-18