TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/VoxBlink2: A 100K+ Speaker Recognition Corpus and the Open...

VoxBlink2: A 100K+ Speaker Recognition Corpus and the Open-Set Speaker-Identification Benchmark

Yuke Lin, Ming Cheng, FuLin Zhang, Yingying Gao, Shilei Zhang, Ming Li

2024-07-16Speaker IdentificationSpeaker RecognitionSpeaker Verification
PaperPDFCode

Abstract

In this paper, we provide a large audio-visual speaker recognition dataset, VoxBlink2, which includes approximately 10M utterances with videos from 110K+ speakers in the wild. This dataset represents a significant expansion over the VoxBlink dataset, encompassing a broader diversity of speakers and scenarios by the grace of an optimized data collection pipeline. Afterward, we explore the impact of training strategies, data scale, and model complexity on speaker verification and finally establish a new single-model state-of-the-art EER at 0.170% and minDCF at 0.006% on the VoxCeleb1-O test set. Such remarkable results motivate us to explore speaker recognition from a new challenging perspective. We raise the Open-Set Speaker-Identification task, which is designed to either match a probe utterance with a known gallery speaker or categorize it as an unknown query. Associated with this task, we design concrete benchmark and evaluation protocols. The data and model resources can be found in http://voxblink2.github.io.

Results

TaskDatasetMetricValueModel
Speaker VerificationVoxCelebEER0.2SimAM-ResNet100

Related Papers

SHIELD: A Secure and Highly Enhanced Integrated Learning for Robust Deepfake Detection against Adversarial Attacks2025-07-17An Exploration of ECAPA-TDNN and x-vector Speaker Representations in Zero-shot Multi-speaker TTS2025-06-25SSAVSV: Towards Unified Model for Self-Supervised Audio-Visual Speaker Verification2025-06-21A Comparative Evaluation of Deep Learning Models for Speech Enhancement in Real-World Noisy Environments2025-06-17Pushing the Performance of Synthetic Speech Detection with Kolmogorov-Arnold Networks and Self-Supervised Learning Models2025-06-17Mitigating Non-Target Speaker Bias in Guided Speaker Embedding2025-06-14CoLMbo: Speaker Language Model for Descriptive Profiling2025-06-11You Are What You Say: Exploiting Linguistic Content for VoicePrivacy Attacks2025-06-11