End-to-end anti-spoofing with RawNet2

Hemlata Tak, Jose Patino, Massimiliano Todisco, Andreas Nautsch, Nicholas Evans, Anthony Larcher

2020-11-02Speaker Verification Audio Deepfake Detection

Abstract

Spoofing countermeasures aim to protect automatic speaker verification systems from attempts to manipulate their reliability with the use of spoofed speech signals. While results from the most recent ASVspoof 2019 evaluation show great potential to detect most forms of attack, some continue to evade detection. This paper reports the first application of RawNet2 to anti-spoofing. RawNet2 ingests raw audio and has potential to learn cues that are not detectable using more traditional countermeasure solutions. We describe modifications made to the original RawNet2 architecture so that it can be applied to anti-spoofing. For A17 attacks, our RawNet2 systems results are the second-best reported, while the fusion of RawNet2 and baseline countermeasures gives the second-best results reported for the full ASVspoof 2019 logical access condition. Our results are reproducible with open source software.

Results

Task	Dataset	Metric	Value	Model
3D Reconstruction	ASVspoof 2021	21DF EER	40.06	RawNet-2
3D Reconstruction	ASVspoof 2021	21LA EER	40.07	RawNet-2
Speaker Verification	ASVspoof 2021	21DF EER	40.06	RawNet-2
Speaker Verification	ASVspoof 2021	21LA EER	40.07	RawNet-2
3D	ASVspoof 2021	21DF EER	40.06	RawNet-2
3D	ASVspoof 2021	21LA EER	40.07	RawNet-2
DeepFake Detection	ASVspoof 2021	21DF EER	40.06	RawNet-2
DeepFake Detection	ASVspoof 2021	21LA EER	40.07	RawNet-2
3D Shape Reconstruction from Videos	ASVspoof 2021	21DF EER	40.06	RawNet-2
3D Shape Reconstruction from Videos	ASVspoof 2021	21LA EER	40.07	RawNet-2

Related Papers

SHIELD: A Secure and Highly Enhanced Integrated Learning for Robust Deepfake Detection against Adversarial Attacks2025-07-17 IndieFake Dataset: A Benchmark Dataset for Audio Deepfake Detection2025-06-23 SSAVSV: Towards Unified Model for Self-Supervised Audio-Visual Speaker Verification2025-06-21 Pushing the Performance of Synthetic Speech Detection with Kolmogorov-Arnold Networks and Self-Supervised Learning Models2025-06-17 A Comparative Evaluation of Deep Learning Models for Speech Enhancement in Real-World Noisy Environments2025-06-17 Mitigating Non-Target Speaker Bias in Guided Speaker Embedding2025-06-14 You Are What You Say: Exploiting Linguistic Content for VoicePrivacy Attacks2025-06-11 SPBA: Utilizing Speech Large Language Model for Backdoor Attacks on Speech Classification Models2025-06-10