CleanUNet 2: A Hybrid Speech Denoising Model on Waveform and Spectrogram

Zhifeng Kong, Wei Ping, Ambrish Dantrey, Bryan Catanzaro

2023-09-12Denoising Speech Synthesis Speech Enhancement Speech Denoising

Abstract

In this work, we present CleanUNet 2, a speech denoising model that combines the advantages of waveform denoiser and spectrogram denoiser and achieves the best of both worlds. CleanUNet 2 uses a two-stage framework inspired by popular speech synthesis methods that consist of a waveform model and a spectrogram model. Specifically, CleanUNet 2 builds upon CleanUNet, the state-of-the-art waveform denoiser, and further boosts its performance by taking predicted spectrograms from a spectrogram denoiser as the input. We demonstrate that CleanUNet 2 outperforms previous methods in terms of various objective and subjective evaluations.

Results

Task	Dataset	Metric	Value	Model
Speech Enhancement	Deep Noise Suppression (DNS) Challenge	PESQ-NB	3.658	CleanUNet-2
Speech Enhancement	Deep Noise Suppression (DNS) Challenge	PESQ-WB	3.262	CleanUNet-2

Related Papers

fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17 Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models2025-07-17 NonverbalTTS: A Public English Corpus of Text-Aligned Nonverbal Vocalizations with Emotion Annotations for Text-to-Speech2025-07-17 Autoregressive Speech Enhancement via Acoustic Tokens2025-07-17 Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16 HUG-VAS: A Hierarchical NURBS-Based Generative Model for Aortic Geometry Synthesis and Controllable Editing2025-07-15 AirLLM: Diffusion Policy-based Adaptive LoRA for Remote Fine-Tuning of LLM over the Air2025-07-15 P.808 Multilingual Speech Enhancement Testing: Approach and Results of URGENT 2025 Challenge2025-07-15