aTENNuate: Optimized Real-time Speech Enhancement with Deep SSMs on Raw Audio

Yan Ru Pei, Ritik Shrivastava, FNU Sidharth

2024-09-05Denoising Super-Resolution Audio Denoising Speech Enhancement Speech Denoising

Abstract

We present aTENNuate, a simple deep state-space autoencoder configured for efficient online raw speech enhancement in an end-to-end fashion. The network's performance is primarily evaluated on raw speech denoising, with additional assessments on tasks such as super-resolution and de-quantization. We benchmark aTENNuate on the VoiceBank + DEMAND and the Microsoft DNS1 synthetic test sets. The network outperforms previous real-time denoising models in terms of PESQ score, parameter count, MACs, and latency. Even as a raw waveform processing model, the model maintains high fidelity to the clean signal with minimal audible artifacts. In addition, the model remains performant even when the noisy input is compressed down to 4000Hz and 4 bits, suggesting general speech enhancement capabilities in low-resource environments. Try it out by pip install attenuate

Results

Task	Dataset	Metric	Value	Model
Speech Enhancement	Deep Noise Suppression (DNS) Challenge	PESQ-WB	2.98	aTENNuate
Speech Enhancement	VoiceBank + DEMAND	CBAK	2.85	aTENNuate
Speech Enhancement	VoiceBank + DEMAND	COVL	3.96	aTENNuate
Speech Enhancement	VoiceBank + DEMAND	CSIG	4.57	aTENNuate
Speech Enhancement	VoiceBank + DEMAND	PESQ (wb)	3.27	aTENNuate
Speech Enhancement	VoiceBank + DEMAND	SI-SDR	15.04	aTENNuate

Related Papers

fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17 Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models2025-07-17 SpectraLift: Physics-Guided Spectral-Inversion Network for Self-Supervised Hyperspectral Image Super-Resolution2025-07-17 Autoregressive Speech Enhancement via Acoustic Tokens2025-07-17 Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16 HUG-VAS: A Hierarchical NURBS-Based Generative Model for Aortic Geometry Synthesis and Controllable Editing2025-07-15 AirLLM: Diffusion Policy-based Adaptive LoRA for Remote Fine-Tuning of LLM over the Air2025-07-15 P.808 Multilingual Speech Enhancement Testing: Approach and Results of URGENT 2025 Challenge2025-07-15