TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/CMGAN: Conformer-Based Metric-GAN for Monaural Speech Enha...

CMGAN: Conformer-Based Metric-GAN for Monaural Speech Enhancement

Sherif Abdulatif, Ruizhe Cao, Bin Yang

2022-09-22Speech RecognitionDenoisingAutomatic Speech RecognitionSuper-ResolutionAutomatic Speech Recognition (ASR)speech-recognitionAudio Super-ResolutionSpeech SeparationSpeech EnhancementSpeech Denoising
PaperPDFCode(official)Code(official)

Abstract

In this work, we further develop the conformer-based metric generative adversarial network (CMGAN) model for speech enhancement (SE) in the time-frequency (TF) domain. This paper builds on our previous work but takes a more in-depth look by conducting extensive ablation studies on model inputs and architectural design choices. We rigorously tested the generalization ability of the model to unseen noise types and distortions. We have fortified our claims through DNS-MOS measurements and listening tests. Rather than focusing exclusively on the speech denoising task, we extend this work to address the dereverberation and super-resolution tasks. This necessitated exploring various architectural changes, specifically metric discriminator scores and masking techniques. It is essential to highlight that this is among the earliest works that attempted complex TF-domain super-resolution. Our findings show that CMGAN outperforms existing state-of-the-art methods in the three major speech enhancement tasks: denoising, dereverberation, and super-resolution. For example, in the denoising task using the Voice Bank+DEMAND dataset, CMGAN notably exceeded the performance of prior models, attaining a PESQ score of 3.41 and an SSNR of 11.10 dB. Audio samples and CMGAN implementations are available online.

Results

TaskDatasetMetricValueModel
Audio GenerationVCTK Multi-SpeakerLog-Spectral Distance0.76CMGAN
Speech EnhancementVoiceBank + DEMANDCBAK3.94CMGAN
Speech EnhancementVoiceBank + DEMANDCOVL4.12CMGAN
Speech EnhancementVoiceBank + DEMANDCSIG4.63CMGAN
Speech EnhancementVoiceBank + DEMANDPESQ (wb)3.41CMGAN
Speech EnhancementVoiceBank + DEMANDSSNR11.1CMGAN
Speech EnhancementVoiceBank + DEMANDSTOI96CMGAN
10-shot image generationVCTK Multi-SpeakerLog-Spectral Distance0.76CMGAN
Audio Super-ResolutionVCTK Multi-SpeakerLog-Spectral Distance0.76CMGAN

Related Papers

Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine2025-07-17NonverbalTTS: A Public English Corpus of Text-Aligned Nonverbal Vocalizations with Emotion Annotations for Text-to-Speech2025-07-17fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models2025-07-17SpectraLift: Physics-Guided Spectral-Inversion Network for Self-Supervised Hyperspectral Image Super-Resolution2025-07-17Autoregressive Speech Enhancement via Acoustic Tokens2025-07-17Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16HUG-VAS: A Hierarchical NURBS-Based Generative Model for Aortic Geometry Synthesis and Controllable Editing2025-07-15