Speech Denoising in the Waveform Domain with Self-Attention

Zhifeng Kong, Wei Ping, Ambrish Dantrey, Bryan Catanzaro

2022-02-15Denoising Speech Enhancement Speech Denoising

Abstract

In this work, we present CleanUNet, a causal speech denoising model on the raw waveform. The proposed model is based on an encoder-decoder architecture combined with several self-attention blocks to refine its bottleneck representations, which is crucial to obtain good results. The model is optimized through a set of losses defined over both waveform and multi-resolution spectrograms. The proposed method outperforms the state-of-the-art models in terms of denoised speech quality from various objective and subjective evaluation metrics. We release our code and models at https://github.com/nvidia/cleanunet.

Results

Task	Dataset	Metric	Value	Model
Speech Enhancement	Deep Noise Suppression (DNS) Challenge	PESQ-NB	3.551	CleanUNet
Speech Enhancement	Deep Noise Suppression (DNS) Challenge	PESQ-WB	3.146	CleanUNet

Related Papers

fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17 Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models2025-07-17 Autoregressive Speech Enhancement via Acoustic Tokens2025-07-17 Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16 HUG-VAS: A Hierarchical NURBS-Based Generative Model for Aortic Geometry Synthesis and Controllable Editing2025-07-15 AirLLM: Diffusion Policy-based Adaptive LoRA for Remote Fine-Tuning of LLM over the Air2025-07-15 P.808 Multilingual Speech Enhancement Testing: Approach and Results of URGENT 2025 Challenge2025-07-15 A statistical physics framework for optimal learning2025-07-10