TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/aTENNuate: Optimized Real-time Speech Enhancement with Dee...

aTENNuate: Optimized Real-time Speech Enhancement with Deep SSMs on Raw Audio

Yan Ru Pei, Ritik Shrivastava, FNU Sidharth

2024-09-05DenoisingSuper-ResolutionAudio DenoisingSpeech EnhancementSpeech Denoising
PaperPDF

Abstract

We present aTENNuate, a simple deep state-space autoencoder configured for efficient online raw speech enhancement in an end-to-end fashion. The network's performance is primarily evaluated on raw speech denoising, with additional assessments on tasks such as super-resolution and de-quantization. We benchmark aTENNuate on the VoiceBank + DEMAND and the Microsoft DNS1 synthetic test sets. The network outperforms previous real-time denoising models in terms of PESQ score, parameter count, MACs, and latency. Even as a raw waveform processing model, the model maintains high fidelity to the clean signal with minimal audible artifacts. In addition, the model remains performant even when the noisy input is compressed down to 4000Hz and 4 bits, suggesting general speech enhancement capabilities in low-resource environments. Try it out by pip install attenuate

Results

TaskDatasetMetricValueModel
Speech EnhancementDeep Noise Suppression (DNS) ChallengePESQ-WB2.98aTENNuate
Speech EnhancementVoiceBank + DEMANDCBAK2.85aTENNuate
Speech EnhancementVoiceBank + DEMANDCOVL3.96aTENNuate
Speech EnhancementVoiceBank + DEMANDCSIG4.57aTENNuate
Speech EnhancementVoiceBank + DEMANDPESQ (wb)3.27aTENNuate
Speech EnhancementVoiceBank + DEMANDSI-SDR15.04aTENNuate

Related Papers

fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models2025-07-17SpectraLift: Physics-Guided Spectral-Inversion Network for Self-Supervised Hyperspectral Image Super-Resolution2025-07-17Autoregressive Speech Enhancement via Acoustic Tokens2025-07-17Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16HUG-VAS: A Hierarchical NURBS-Based Generative Model for Aortic Geometry Synthesis and Controllable Editing2025-07-15AirLLM: Diffusion Policy-based Adaptive LoRA for Remote Fine-Tuning of LLM over the Air2025-07-15P.808 Multilingual Speech Enhancement Testing: Approach and Results of URGENT 2025 Challenge2025-07-15