TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets/VoxCeleb2

VoxCeleb2

AudioImagesTextsVideosCC BY 4.0Introduced 2018-01-01

VoxCeleb2 is a large scale speaker recognition dataset obtained automatically from open-source media. VoxCeleb2 consists of over a million utterances from over 6k speakers. Since the dataset is collected ‘in the wild’, the speech segments are corrupted with real world noise including laughter, cross-talk, channel effects, music and other sounds. The dataset is also multilingual, with speech from speakers of 145 different nationalities, covering a wide range of accents, ages, ethnicities and languages. The dataset is audio-visual, so is also useful for a number of other applications, for example – visual speech synthesis, speech separation, cross-modal transfer from face to voice or vice versa and training face recognition from video to complement existing face recognition datasets.

Source: VoxCeleb2: Deep Speaker Recognition Image Source: https://www.robots.ox.ac.uk/~vgg/data/voxceleb/

Benchmarks

Speaker Verification/EERSpeech Separation/SI-SNRiSpeech Separation/SDRi

Related Benchmarks

VoxCeleb2 - 1-shot learning/10-shot image generation/CSIMVoxCeleb2 - 1-shot learning/10-shot image generation/FIDVoxCeleb2 - 1-shot learning/10-shot image generation/LPIPSVoxCeleb2 - 1-shot learning/10-shot image generation/Normalized Pose ErrorVoxCeleb2 - 1-shot learning/10-shot image generation/SSIMVoxCeleb2 - 1-shot learning/10-shot image generation/inference time (ms)VoxCeleb2 - 1-shot learning/3D/CSIMVoxCeleb2 - 1-shot learning/3D/FIDVoxCeleb2 - 1-shot learning/3D/LPIPSVoxCeleb2 - 1-shot learning/3D/Normalized Pose ErrorVoxCeleb2 - 1-shot learning/3D/SSIMVoxCeleb2 - 1-shot learning/3D/inference time (ms)VoxCeleb2 - 1-shot learning/3D Face Modelling/CSIMVoxCeleb2 - 1-shot learning/3D Face Modelling/FIDVoxCeleb2 - 1-shot learning/3D Face Modelling/LPIPSVoxCeleb2 - 1-shot learning/3D Face Modelling/Normalized Pose ErrorVoxCeleb2 - 1-shot learning/3D Face Modelling/SSIMVoxCeleb2 - 1-shot learning/3D Face Modelling/inference time (ms)VoxCeleb2 - 1-shot learning/3D Face Reconstruction/CSIMVoxCeleb2 - 1-shot learning/3D Face Reconstruction/FIDVoxCeleb2 - 1-shot learning/3D Face Reconstruction/LPIPSVoxCeleb2 - 1-shot learning/3D Face Reconstruction/Normalized Pose ErrorVoxCeleb2 - 1-shot learning/3D Face Reconstruction/SSIMVoxCeleb2 - 1-shot learning/3D Face Reconstruction/inference time (ms)VoxCeleb2 - 1-shot learning/Face Generation/CSIMVoxCeleb2 - 1-shot learning/Face Generation/FIDVoxCeleb2 - 1-shot learning/Face Generation/LPIPSVoxCeleb2 - 1-shot learning/Face Generation/Normalized Pose ErrorVoxCeleb2 - 1-shot learning/Face Generation/SSIMVoxCeleb2 - 1-shot learning/Face Generation/inference time (ms)VoxCeleb2 - 1-shot learning/Face Reconstruction/CSIMVoxCeleb2 - 1-shot learning/Face Reconstruction/FIDVoxCeleb2 - 1-shot learning/Face Reconstruction/LPIPSVoxCeleb2 - 1-shot learning/Face Reconstruction/Normalized Pose ErrorVoxCeleb2 - 1-shot learning/Face Reconstruction/SSIMVoxCeleb2 - 1-shot learning/Face Reconstruction/inference time (ms)VoxCeleb2 - 1-shot learning/Facial Recognition and Modelling/CSIMVoxCeleb2 - 1-shot learning/Facial Recognition and Modelling/FIDVoxCeleb2 - 1-shot learning/Facial Recognition and Modelling/LPIPSVoxCeleb2 - 1-shot learning/Facial Recognition and Modelling/Normalized Pose ErrorVoxCeleb2 - 1-shot learning/Facial Recognition and Modelling/SSIMVoxCeleb2 - 1-shot learning/Facial Recognition and Modelling/inference time (ms)VoxCeleb2 - 1-shot learning/Image Generation/CSIMVoxCeleb2 - 1-shot learning/Image Generation/FIDVoxCeleb2 - 1-shot learning/Image Generation/LPIPSVoxCeleb2 - 1-shot learning/Image Generation/Normalized Pose ErrorVoxCeleb2 - 1-shot learning/Image Generation/SSIMVoxCeleb2 - 1-shot learning/Image Generation/inference time (ms)VoxCeleb2 - 1-shot learning/Talking Head Generation/CSIMVoxCeleb2 - 1-shot learning/Talking Head Generation/FIDVoxCeleb2 - 1-shot learning/Talking Head Generation/LPIPSVoxCeleb2 - 1-shot learning/Talking Head Generation/Normalized Pose ErrorVoxCeleb2 - 1-shot learning/Talking Head Generation/SSIMVoxCeleb2 - 1-shot learning/Talking Head Generation/inference time (ms)VoxCeleb2 - 32-shot learning/10-shot image generation/FIDVoxCeleb2 - 32-shot learning/3D/FIDVoxCeleb2 - 32-shot learning/3D Face Modelling/FIDVoxCeleb2 - 32-shot learning/3D Face Reconstruction/FIDVoxCeleb2 - 32-shot learning/Face Generation/FIDVoxCeleb2 - 32-shot learning/Face Reconstruction/FIDVoxCeleb2 - 32-shot learning/Facial Recognition and Modelling/FIDVoxCeleb2 - 32-shot learning/Image Generation/FIDVoxCeleb2 - 32-shot learning/Talking Head Generation/FIDVoxCeleb2 - 8-shot learning/10-shot image generation/FIDVoxCeleb2 - 8-shot learning/3D/FIDVoxCeleb2 - 8-shot learning/3D Face Modelling/FIDVoxCeleb2 - 8-shot learning/3D Face Reconstruction/FIDVoxCeleb2 - 8-shot learning/Face Generation/FIDVoxCeleb2 - 8-shot learning/Face Reconstruction/FIDVoxCeleb2 - 8-shot learning/Facial Recognition and Modelling/FIDVoxCeleb2 - 8-shot learning/Image Generation/FIDVoxCeleb2 - 8-shot learning/Talking Head Generation/FID

Statistics

Papers
564
Benchmarks
3

Links

Homepage

Tasks

Speaker VerificationSpeech SeparationTalking Head Generation