MANNER: Multi-view Attention Network for Noise Erasure

Hyun Joon Park, Byung Ha Kang, WooSeok Shin, Jin Sob Kim, Sung Won Han

2022-03-04Speech Enhancement

Abstract

In the field of speech enhancement, time domain methods have difficulties in achieving both high performance and efficiency. Recently, dual-path models have been adopted to represent long sequential features, but they still have limited representations and poor memory efficiency. In this study, we propose Multi-view Attention Network for Noise ERasure (MANNER) consisting of a convolutional encoder-decoder with a multi-view attention block, applied to the time-domain signals. MANNER efficiently extracts three different representations from noisy speech and estimates high-quality clean speech. We evaluated MANNER on the VoiceBank-DEMAND dataset in terms of five objective speech quality metrics. Experimental results show that MANNER achieves state-of-the-art performance while efficiently processing noisy speech.

Results

Task	Dataset	Metric	Value	Model
Speech Enhancement	VoiceBank + DEMAND	CBAK	3.65	MANNER
Speech Enhancement	VoiceBank + DEMAND	COVL	3.91	MANNER
Speech Enhancement	VoiceBank + DEMAND	CSIG	4.53	MANNER
Speech Enhancement	VoiceBank + DEMAND	PESQ (wb)	3.21	MANNER
Speech Enhancement	VoiceBank + DEMAND	STOI	95	MANNER

Related Papers

Autoregressive Speech Enhancement via Acoustic Tokens2025-07-17 P.808 Multilingual Speech Enhancement Testing: Approach and Results of URGENT 2025 Challenge2025-07-15 Robust One-step Speech Enhancement via Consistency Distillation2025-07-08 Speech Quality Assessment Model Based on Mixture of Experts: System-Level Performance Enhancement and Utterance-Level Challenge Analysis2025-07-08 MambAttention: Mamba with Multi-Head Attention for Generalizable Single-Channel Speech Enhancement2025-07-01 Frequency-Weighted Training Losses for Phoneme-Level DNN-based Speech Enhancement2025-06-23 EDNet: A Distortion-Agnostic Speech Enhancement Framework with Gating Mamba Mechanism and Phase Shift-Invariant Training2025-06-19 A Comparative Evaluation of Deep Learning Models for Speech Enhancement in Real-World Noisy Environments2025-06-17