Tandem spoofing-robust automatic speaker verification based on time-domain embeddings

Avishai Weizman, Yehuda Ben-Shimol, Itshak Lapidot

2024-12-22Voice Anti-spoofing Speaker Verification

Abstract

Spoofing-robust automatic speaker verification (SASV) systems are a crucial technology for the protection against spoofed speech. In this study, we focus on logical access attacks and introduce a novel approach to SASV tasks. A novel representation of genuine and spoofed speech is employed, based on the probability mass function (PMF) of waveform amplitudes in the time domain. This methodology generates novel time embeddings derived from the PMF of selected groups within the training set. This paper highlights the role of gender segregation and its positive impact on performance. We propose a countermeasure (CM) system that employs time-domain embeddings derived from the PMF of spoofed and genuine speech, as well as gender recognition based on male and female time-based embeddings. The method exhibits notable gender recognition capabilities, with mismatch rates of 0.94% and 1.79% for males and females, respectively. The male and female CM systems achieve an equal error rate (EER) of 8.67% and 10.12%, respectively. By integrating this approach with traditional speaker verification systems, we demonstrate improved generalization ability and tandem detection cost function evaluation using the ASVspoof2019 challenge database. Furthermore, we investigate the impact of fusing the time embedding approach with traditional CM and illustrate how this fusion enhances generalization in SASV architectures.

Results

Task	Dataset	Metric	Value	Model
Speaker Verification	ASVspoof 2019 - LA	minDCF	0.004	ECAPA-TDNN
Voice Anti-spoofing	ASVspoof 2019 - LA	min a-DCF	0.1684	GD
Voice Anti-spoofing	ASVspoof 2019 - LA	min t-dcf	0.2709	GD

Related Papers

SHIELD: A Secure and Highly Enhanced Integrated Learning for Robust Deepfake Detection against Adversarial Attacks2025-07-17 SSAVSV: Towards Unified Model for Self-Supervised Audio-Visual Speaker Verification2025-06-21 Pushing the Performance of Synthetic Speech Detection with Kolmogorov-Arnold Networks and Self-Supervised Learning Models2025-06-17 A Comparative Evaluation of Deep Learning Models for Speech Enhancement in Real-World Noisy Environments2025-06-17 Mitigating Non-Target Speaker Bias in Guided Speaker Embedding2025-06-14 You Are What You Say: Exploiting Linguistic Content for VoicePrivacy Attacks2025-06-11 SPBA: Utilizing Speech Large Language Model for Backdoor Attacks on Speech Classification Models2025-06-10 FROST-EMA: Finnish and Russian Oral Speech Dataset of Electromagnetic Articulography Measurements with L1, L2 and Imitated L2 Accents2025-06-10