TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Robust One-step Speech Enhancement via Consistency Distill...

Robust One-step Speech Enhancement via Consistency Distillation

Liang Xu, Longfei Felix Yan, W. Bastiaan Kleijn

2025-07-08Speech Enhancement
PaperPDFCode(official)

Abstract

Diffusion models have shown strong performance in speech enhancement, but their real-time applicability has been limited by multi-step iterative sampling. Consistency distillation has recently emerged as a promising alternative by distilling a one-step consistency model from a multi-step diffusion-based teacher model. However, distilled consistency models are inherently biased towards the sampling trajectory of the teacher model, making them less robust to noise and prone to inheriting inaccuracies from the teacher model. To address this limitation, we propose ROSE-CD: Robust One-step Speech Enhancement via Consistency Distillation, a novel approach for distilling a one-step consistency model. Specifically, we introduce a randomized learning trajectory to improve the model's robustness to noise. Furthermore, we jointly optimize the one-step model with two time-domain auxiliary losses, enabling it to recover from teacher-induced errors and surpass the teacher model in overall performance. This is the first pure one-step consistency distillation model for diffusion-based speech enhancement, achieving 54 times faster inference speed and superior performance compared to its 30-step teacher model. Experiments on the VoiceBank-DEMAND dataset demonstrate that the proposed model achieves state-of-the-art performance in terms of speech quality. Moreover, its generalization ability is validated on both an out-of-domain dataset and real-world noisy recordings.

Results

TaskDatasetMetricValueModel
Speech EnhancementVoiceBank + DEMANDCBAK3.37ROSE-CD(PESQ)
Speech EnhancementVoiceBank + DEMANDCOVL4.3ROSE-CD(PESQ)
Speech EnhancementVoiceBank + DEMANDCSIG4.63ROSE-CD(PESQ)
Speech EnhancementVoiceBank + DEMANDESTOI0.83ROSE-CD(PESQ)
Speech EnhancementVoiceBank + DEMANDPESQ (wb)3.99ROSE-CD(PESQ)
Speech EnhancementVoiceBank + DEMANDPara. (M)65ROSE-CD(PESQ)
Speech EnhancementVoiceBank + DEMANDSI-SDR0.4ROSE-CD(PESQ)
Speech EnhancementVoiceBank + DEMANDSSNR0.927ROSE-CD(PESQ)
Speech EnhancementVoiceBank + DEMANDSTOI92.6ROSE-CD(PESQ)
Speech EnhancementVoiceBank + DEMANDCBAK3.33ROSE-CD
Speech EnhancementVoiceBank + DEMANDCOVL4.04ROSE-CD
Speech EnhancementVoiceBank + DEMANDCSIG4.523ROSE-CD
Speech EnhancementVoiceBank + DEMANDESTOI0.87ROSE-CD
Speech EnhancementVoiceBank + DEMANDPESQ (wb)3.49ROSE-CD
Speech EnhancementVoiceBank + DEMANDPara. (M)65ROSE-CD
Speech EnhancementVoiceBank + DEMANDSI-SDR17.8ROSE-CD
Speech EnhancementVoiceBank + DEMANDSSNR3.34ROSE-CD
Speech EnhancementVoiceBank + DEMANDSTOI94.73ROSE-CD
Speech EnhancementVoiceBank+DEMANDDNSMOS3.48ROSE-CD
Speech EnhancementVoiceBank+DEMANDDNSMOS BAK4.34ROSE-CD
Speech EnhancementVoiceBank+DEMANDDNSMOS OVRL3.7ROSE-CD
Speech EnhancementVoiceBank+DEMANDDNSMOS SIG4.02ROSE-CD
Speech EnhancementVoiceBank+DEMANDESTOI0.87ROSE-CD
Speech EnhancementVoiceBank+DEMANDPESQ3.49ROSE-CD
Speech EnhancementVoiceBank+DEMANDSI-SDR17.8ROSE-CD
Speech EnhancementVoiceBank+DEMANDDNSMOS3.01rose_cd(PESQ )
Speech EnhancementVoiceBank+DEMANDDNSMOS BAK4.29rose_cd(PESQ )
Speech EnhancementVoiceBank+DEMANDDNSMOS OVRL3.28rose_cd(PESQ )
Speech EnhancementVoiceBank+DEMANDDNSMOS SIG3.52rose_cd(PESQ )
Speech EnhancementVoiceBank+DEMANDESTOI0.83rose_cd(PESQ )
Speech EnhancementVoiceBank+DEMANDPESQ3.99rose_cd(PESQ )
Speech EnhancementVoiceBank+DEMANDPESQ (wb)3.99rose_cd(PESQ )
Speech EnhancementVoiceBank+DEMANDSI-SDR0.4rose_cd(PESQ )

Related Papers

Autoregressive Speech Enhancement via Acoustic Tokens2025-07-17P.808 Multilingual Speech Enhancement Testing: Approach and Results of URGENT 2025 Challenge2025-07-15Speech Quality Assessment Model Based on Mixture of Experts: System-Level Performance Enhancement and Utterance-Level Challenge Analysis2025-07-08MambAttention: Mamba with Multi-Head Attention for Generalizable Single-Channel Speech Enhancement2025-07-01Frequency-Weighted Training Losses for Phoneme-Level DNN-based Speech Enhancement2025-06-23EDNet: A Distortion-Agnostic Speech Enhancement Framework with Gating Mamba Mechanism and Phase Shift-Invariant Training2025-06-19A Comparative Evaluation of Deep Learning Models for Speech Enhancement in Real-World Noisy Environments2025-06-17Efficient Speech Enhancement via Embeddings from Pre-trained Generative Audioencoders2025-06-13