xLSTM-SENet: xLSTM for Single-Channel Speech Enhancement

Nikolai Lund Kühne, Jan Østergaard, Jesper Jensen, Zheng-Hua Tan

2025-01-10Speech Enhancement

Abstract

While attention-based architectures, such as Conformers, excel in speech enhancement, they face challenges such as scalability with respect to input sequence length. In contrast, the recently proposed Extended Long Short-Term Memory (xLSTM) architecture offers linear scalability. However, xLSTM-based models remain unexplored for speech enhancement. This paper introduces xLSTM-SENet, the first xLSTM-based single-channel speech enhancement system. A comparative analysis reveals that xLSTM-and notably, even LSTM-can match or outperform state-of-the-art Mamba- and Conformer-based systems across various model sizes in speech enhancement on the VoiceBank+Demand dataset. Through ablation studies, we identify key architectural design choices such as exponential gating and bidirectionality contributing to its effectiveness. Our best xLSTM-based model, xLSTM-SENet2, outperforms state-of-the-art Mamba- and Conformer-based systems of similar complexity on the Voicebank+DEMAND dataset.

Results

Task	Dataset	Metric	Value	Model
Speech Enhancement	VoiceBank + DEMAND	CBAK	3.98	xLSTM-SENet2
Speech Enhancement	VoiceBank + DEMAND	COVL	4.27	xLSTM-SENet2
Speech Enhancement	VoiceBank + DEMAND	CSIG	4.78	xLSTM-SENet2
Speech Enhancement	VoiceBank + DEMAND	PESQ (wb)	3.53	xLSTM-SENet2
Speech Enhancement	VoiceBank + DEMAND	Para. (M)	2.27	xLSTM-SENet2
Speech Enhancement	VoiceBank + DEMAND	STOI	0.96	xLSTM-SENet2

Related Papers

Autoregressive Speech Enhancement via Acoustic Tokens2025-07-17 P.808 Multilingual Speech Enhancement Testing: Approach and Results of URGENT 2025 Challenge2025-07-15 Robust One-step Speech Enhancement via Consistency Distillation2025-07-08 Speech Quality Assessment Model Based on Mixture of Experts: System-Level Performance Enhancement and Utterance-Level Challenge Analysis2025-07-08 MambAttention: Mamba with Multi-Head Attention for Generalizable Single-Channel Speech Enhancement2025-07-01 Frequency-Weighted Training Losses for Phoneme-Level DNN-based Speech Enhancement2025-06-23 EDNet: A Distortion-Agnostic Speech Enhancement Framework with Gating Mamba Mechanism and Phase Shift-Invariant Training2025-06-19 A Comparative Evaluation of Deep Learning Models for Speech Enhancement in Real-World Noisy Environments2025-06-17