TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/XLSR-Mamba: A Dual-Column Bidirectional State Space Model ...

XLSR-Mamba: A Dual-Column Bidirectional State Space Model for Spoofing Attack Detection

Yang Xiao, Rohan Kumar Das

2024-11-15Speech RecognitionAutomatic Speech Recognitionspeech-recognitionSelf-Supervised LearningAudio Deepfake Detection
PaperPDFCode(official)

Abstract

Transformers and their variants have achieved great success in speech processing. However, their multi-head self-attention mechanism is computationally expensive. Therefore, one novel selective state space model, Mamba, has been proposed as an alternative. Building on its success in automatic speech recognition, we apply Mamba for spoofing attack detection. Mamba is well-suited for this task as it can capture the artifacts in spoofed speech signals by handling long-length sequences. However, Mamba's performance may suffer when it is trained with limited labeled data. To mitigate this, we propose combining a new structure of Mamba based on a dual-column architecture with self-supervised learning, using the pre-trained wav2vec 2.0 model. The experiments show that our proposed approach achieves competitive results and faster inference on the ASVspoof 2021 LA and DF datasets, and on the more challenging In-the-Wild dataset, it emerges as the strongest candidate for spoofing attack detection. The code has been publicly released in https://github.com/swagshaw/XLSR-Mamba.

Results

TaskDatasetMetricValueModel
3D ReconstructionASVspoof 202121DF EER1.88XLSR-Mamba
3D ReconstructionASVspoof 202121LA EER0.93XLSR-Mamba
Speaker VerificationASVspoof 202121DF EER1.88XLSR-Mamba
Speaker VerificationASVspoof 202121LA EER0.93XLSR-Mamba
3DASVspoof 202121DF EER1.88XLSR-Mamba
3DASVspoof 202121LA EER0.93XLSR-Mamba
DeepFake DetectionASVspoof 202121DF EER1.88XLSR-Mamba
DeepFake DetectionASVspoof 202121LA EER0.93XLSR-Mamba
3D Shape Reconstruction from VideosASVspoof 202121DF EER1.88XLSR-Mamba
3D Shape Reconstruction from VideosASVspoof 202121LA EER0.93XLSR-Mamba

Related Papers

Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine2025-07-17NonverbalTTS: A Public English Corpus of Text-Aligned Nonverbal Vocalizations with Emotion Annotations for Text-to-Speech2025-07-17A Semi-Supervised Learning Method for the Identification of Bad Exposures in Large Imaging Surveys2025-07-17WhisperKit: On-device Real-time ASR with Billion-Scale Transformers2025-07-14Self-supervised Learning on Camera Trap Footage Yields a Strong Universal Face Embedder2025-07-14VisualSpeaker: Visually-Guided 3D Avatar Lip Synthesis2025-07-08Speech Quality Assessment Model Based on Mixture of Experts: System-Level Performance Enhancement and Utterance-Level Challenge Analysis2025-07-08A Hybrid Machine Learning Framework for Optimizing Crop Selection via Agronomic and Economic Forecasting2025-07-06