SCP-GAN: Self-Correcting Discriminator Optimization for Training Consistency Preserving Metric GAN on Speech Enhancement Tasks

Vasily Zadorozhnyy, Qiang Ye, Kazuhito Koishida

2022-10-26Speech Enhancement

Abstract

In recent years, Generative Adversarial Networks (GANs) have produced significantly improved results in speech enhancement (SE) tasks. They are difficult to train, however. In this work, we introduce several improvements to the GAN training schemes, which can be applied to most GAN-based SE models. We propose using consistency loss functions, which target the inconsistency in time and time-frequency domains caused by Fourier and Inverse Fourier Transforms. We also present self-correcting optimization for training a GAN discriminator on SE tasks, which helps avoid "harmful" training directions for parts of the discriminator loss function. We have tested our proposed methods on several state-of-the-art GAN-based SE models and obtained consistent improvements, including new state-of-the-art results for the Voice Bank+DEMAND dataset.

Results

Task	Dataset	Metric	Value	Model
Speech Enhancement	VoiceBank + DEMAND	CBAK	3.97	SCP-CMGAN
Speech Enhancement	VoiceBank + DEMAND	COVL	4.25	SCP-CMGAN
Speech Enhancement	VoiceBank + DEMAND	CSIG	4.75	SCP-CMGAN
Speech Enhancement	VoiceBank + DEMAND	PESQ (wb)	3.52	SCP-CMGAN
Speech Enhancement	VoiceBank + DEMAND	SSNR	10.82	SCP-CMGAN
Speech Enhancement	VoiceBank + DEMAND	STOI	96	SCP-CMGAN

Related Papers

Autoregressive Speech Enhancement via Acoustic Tokens2025-07-17 P.808 Multilingual Speech Enhancement Testing: Approach and Results of URGENT 2025 Challenge2025-07-15 Robust One-step Speech Enhancement via Consistency Distillation2025-07-08 Speech Quality Assessment Model Based on Mixture of Experts: System-Level Performance Enhancement and Utterance-Level Challenge Analysis2025-07-08 MambAttention: Mamba with Multi-Head Attention for Generalizable Single-Channel Speech Enhancement2025-07-01 Frequency-Weighted Training Losses for Phoneme-Level DNN-based Speech Enhancement2025-06-23 EDNet: A Distortion-Agnostic Speech Enhancement Framework with Gating Mamba Mechanism and Phase Shift-Invariant Training2025-06-19 A Comparative Evaluation of Deep Learning Models for Speech Enhancement in Real-World Noisy Environments2025-06-17