TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/RemixIT: Continual self-training of speech enhancement mod...

RemixIT: Continual self-training of speech enhancement models via bootstrapped remixing

Efthymios Tzinis, Yossi Adi, Vamsi Krishna Ithapu, Buye Xu, Paris Smaragdis, Anurag Kumar

2022-02-17Unsupervised Domain AdaptationSpeech EnhancementDomain Adaptation
PaperPDFCodeCode(official)

Abstract

We present RemixIT, a simple yet effective self-supervised method for training speech enhancement without the need of a single isolated in-domain speech nor a noise waveform. Our approach overcomes limitations of previous methods which make them dependent on clean in-domain target signals and thus, sensitive to any domain mismatch between train and test samples. RemixIT is based on a continuous self-training scheme in which a pre-trained teacher model on out-of-domain data infers estimated pseudo-target signals for in-domain mixtures. Then, by permuting the estimated clean and noise signals and remixing them together, we generate a new set of bootstrapped mixtures and corresponding pseudo-targets which are used to train the student network. Vice-versa, the teacher periodically refines its estimates using the updated parameters of the latest student models. Experimental results on multiple speech enhancement datasets and tasks not only show the superiority of our method over prior approaches but also showcase that RemixIT can be combined with any separation model as well as be applied towards any semi-supervised and unsupervised domain adaptation task. Our analysis, paired with empirical evidence, sheds light on the inside functioning of our self-training scheme wherein the student model keeps obtaining better performance while observing severely degraded pseudo-targets.

Results

TaskDatasetMetricValueModel
Speech EnhancementDeep Noise Suppression (DNS) ChallengePESQ-WB2.95Sudo rm -rf (U=32)
Speech EnhancementDeep Noise Suppression (DNS) ChallengeSI-SDR-WB19.7Sudo rm -rf (U=32)
Speech EnhancementDeep Noise Suppression (DNS) ChallengePESQ-WB2.34RemixIT (w Sudo U=32)
Speech EnhancementDeep Noise Suppression (DNS) ChallengeSI-SDR-WB16RemixIT (w Sudo U=32)

Related Papers

Autoregressive Speech Enhancement via Acoustic Tokens2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17P.808 Multilingual Speech Enhancement Testing: Approach and Results of URGENT 2025 Challenge2025-07-15Domain Borders Are There to Be Crossed With Federated Few-Shot Adaptation2025-07-14An Offline Mobile Conversational Agent for Mental Health Support: Learning from Emotional Dialogues and Psychological Texts with Student-Centered Evaluation2025-07-11The Bayesian Approach to Continual Learning: An Overview2025-07-11Doodle Your Keypoints: Sketch-Based Few-Shot Keypoint Detection2025-07-10Robust One-step Speech Enhancement via Consistency Distillation2025-07-08