TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Continuous Speech Separation with Conformer

Continuous Speech Separation with Conformer

Sanyuan Chen, Yu Wu, Zhuo Chen, Jian Wu, Jinyu Li, Takuya Yoshioka, Chengyi Wang, Shujie Liu, Ming Zhou

2020-08-13Speech Separation
PaperPDFCode

Abstract

Continuous speech separation plays a vital role in complicated speech related tasks such as conversation transcription. The separation model extracts a single speaker signal from a mixed speech. In this paper, we use transformer and conformer in lieu of recurrent neural networks in the separation system, as we believe capturing global information with the self-attention based method is crucial for the speech separation. Evaluating on the LibriCSS dataset, the conformer separation model achieves state of the art results, with a relative 23.5% word error rate (WER) reduction from bi-directional LSTM (BLSTM) in the utterance-wise evaluation and a 15.4% WER reduction in the continuous evaluation.

Results

TaskDatasetMetricValueModel
Speech SeparationLibriCSS0L5Conformer (large)
Speech SeparationLibriCSS0S5.4Conformer (large)
Speech SeparationLibriCSS10%7.5Conformer (large)
Speech SeparationLibriCSS20%10.7Conformer (large)
Speech SeparationLibriCSS30%13.8Conformer (large)
Speech SeparationLibriCSS40%17.1Conformer (large)
Speech SeparationLibriCSS0L5.4Conformer (base)
Speech SeparationLibriCSS0S5.6Conformer (base)
Speech SeparationLibriCSS10%8.2Conformer (base)
Speech SeparationLibriCSS20%11.8Conformer (base)
Speech SeparationLibriCSS30%15.5Conformer (base)
Speech SeparationLibriCSS40%18.9Conformer (base)

Related Papers

Dynamic Slimmable Networks for Efficient Speech Separation2025-07-08Improving Practical Aspects of End-to-End Multi-Talker Speech Recognition for Online and Offline Scenarios2025-06-17SoloSpeech: Enhancing Intelligibility and Quality in Target Speech Extraction through a Cascaded Generative Pipeline2025-05-25Attractor-Based Speech Separation of Multiple Utterances by Unknown Number of Speakers2025-05-22Single-Channel Target Speech Extraction Utilizing Distance and Room Clues2025-05-20Time-Frequency-Based Attention Cache Memory Model for Real-Time Speech Separation2025-05-19SepPrune: Structured Pruning for Efficient Deep Speech Separation2025-05-17A Survey of Deep Learning for Complex Speech Spectrograms2025-05-13