TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Attention Back-end for Automatic Speaker Verification with...

Attention Back-end for Automatic Speaker Verification with Multiple Enrollment Utterances

Chang Zeng, Xin Wang, Erica Cooper, Xiaoxiao Miao, Junichi Yamagishi

2021-04-04Speaker Verification
PaperPDFCode(official)

Abstract

Probabilistic linear discriminant analysis (PLDA) or cosine similarity have been widely used in traditional speaker verification systems as back-end techniques to measure pairwise similarities. To make better use of multiple enrollment utterances, we propose a novel attention back-end model, which can be used for both text-independent (TI) and text-dependent (TD) speaker verification, and employ scaled-dot self-attention and feed-forward self-attention networks as architectures that learn the intra-relationships of the enrollment utterances. In order to verify the proposed attention back-end, we conduct a series of experiments on CNCeleb and VoxCeleb datasets by combining it with several sate-of-the-art speaker encoders including TDNN and ResNet. Experimental results using multiple enrollment utterances on CNCeleb show that the proposed attention back-end model leads to lower EER and minDCF score than the PLDA and cosine similarity counterparts for each speaker encoder and an experiment on VoxCeleb indicate that our model can be used even for single enrollment case.

Results

TaskDatasetMetricValueModel
Speaker VerificationCN-CELEBEER10.12X-Vectors with Attention Backend
Speaker VerificationCN-CELEBEER10.77ResNet with Attention Backend

Related Papers

SHIELD: A Secure and Highly Enhanced Integrated Learning for Robust Deepfake Detection against Adversarial Attacks2025-07-17SSAVSV: Towards Unified Model for Self-Supervised Audio-Visual Speaker Verification2025-06-21Pushing the Performance of Synthetic Speech Detection with Kolmogorov-Arnold Networks and Self-Supervised Learning Models2025-06-17A Comparative Evaluation of Deep Learning Models for Speech Enhancement in Real-World Noisy Environments2025-06-17Mitigating Non-Target Speaker Bias in Guided Speaker Embedding2025-06-14You Are What You Say: Exploiting Linguistic Content for VoicePrivacy Attacks2025-06-11SPBA: Utilizing Speech Large Language Model for Backdoor Attacks on Speech Classification Models2025-06-10FROST-EMA: Finnish and Russian Oral Speech Dataset of Electromagnetic Articulography Measurements with L1, L2 and Imitated L2 Accents2025-06-10