Tasks SotA Datasets Papers Methods Submit About

Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable Benchmarks All SotA Datasets Papers Methods

Community

Submit Results About

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets/VoxCeleb2

VoxCeleb2

AudioImagesTextsVideosCC BY 4.0Introduced 2018-01-01

VoxCeleb2 is a large scale speaker recognition dataset obtained automatically from open-source media. VoxCeleb2 consists of over a million utterances from over 6k speakers. Since the dataset is collected ‘in the wild’, the speech segments are corrupted with real world noise including laughter, cross-talk, channel effects, music and other sounds. The dataset is also multilingual, with speech from speakers of 145 different nationalities, covering a wide range of accents, ages, ethnicities and languages. The dataset is audio-visual, so is also useful for a number of other applications, for example – visual speech synthesis, speech separation, cross-modal transfer from face to voice or vice versa and training face recognition from video to complement existing face recognition datasets.

Source: VoxCeleb2: Deep Speaker Recognition Image Source: https://www.robots.ox.ac.uk/~vgg/data/voxceleb/

Benchmarks

Speaker Verification/EER Speech Separation/SI-SNRi Speech Separation/SDRi

Related Benchmarks

VoxCeleb2 - 1-shot learning/10-shot image generation/CSIM VoxCeleb2 - 1-shot learning/10-shot image generation/FID VoxCeleb2 - 1-shot learning/10-shot image generation/LPIPS VoxCeleb2 - 1-shot learning/10-shot image generation/Normalized Pose Error VoxCeleb2 - 1-shot learning/10-shot image generation/SSIM VoxCeleb2 - 1-shot learning/10-shot image generation/inference time (ms)VoxCeleb2 - 1-shot learning/3D/CSIM VoxCeleb2 - 1-shot learning/3D/FID VoxCeleb2 - 1-shot learning/3D/LPIPS VoxCeleb2 - 1-shot learning/3D/Normalized Pose Error VoxCeleb2 - 1-shot learning/3D/SSIM VoxCeleb2 - 1-shot learning/3D/inference time (ms)VoxCeleb2 - 1-shot learning/3D Face Modelling/CSIM VoxCeleb2 - 1-shot learning/3D Face Modelling/FID VoxCeleb2 - 1-shot learning/3D Face Modelling/LPIPS VoxCeleb2 - 1-shot learning/3D Face Modelling/Normalized Pose Error VoxCeleb2 - 1-shot learning/3D Face Modelling/SSIM VoxCeleb2 - 1-shot learning/3D Face Modelling/inference time (ms)VoxCeleb2 - 1-shot learning/3D Face Reconstruction/CSIM VoxCeleb2 - 1-shot learning/3D Face Reconstruction/FID VoxCeleb2 - 1-shot learning/3D Face Reconstruction/LPIPS VoxCeleb2 - 1-shot learning/3D Face Reconstruction/Normalized Pose Error VoxCeleb2 - 1-shot learning/3D Face Reconstruction/SSIM VoxCeleb2 - 1-shot learning/3D Face Reconstruction/inference time (ms)VoxCeleb2 - 1-shot learning/Face Generation/CSIM VoxCeleb2 - 1-shot learning/Face Generation/FID VoxCeleb2 - 1-shot learning/Face Generation/LPIPS VoxCeleb2 - 1-shot learning/Face Generation/Normalized Pose Error VoxCeleb2 - 1-shot learning/Face Generation/SSIM VoxCeleb2 - 1-shot learning/Face Generation/inference time (ms)VoxCeleb2 - 1-shot learning/Face Reconstruction/CSIM VoxCeleb2 - 1-shot learning/Face Reconstruction/FID VoxCeleb2 - 1-shot learning/Face Reconstruction/LPIPS VoxCeleb2 - 1-shot learning/Face Reconstruction/Normalized Pose Error VoxCeleb2 - 1-shot learning/Face Reconstruction/SSIM VoxCeleb2 - 1-shot learning/Face Reconstruction/inference time (ms)VoxCeleb2 - 1-shot learning/Facial Recognition and Modelling/CSIM VoxCeleb2 - 1-shot learning/Facial Recognition and Modelling/FID VoxCeleb2 - 1-shot learning/Facial Recognition and Modelling/LPIPS VoxCeleb2 - 1-shot learning/Facial Recognition and Modelling/Normalized Pose Error VoxCeleb2 - 1-shot learning/Facial Recognition and Modelling/SSIM VoxCeleb2 - 1-shot learning/Facial Recognition and Modelling/inference time (ms)VoxCeleb2 - 1-shot learning/Image Generation/CSIM VoxCeleb2 - 1-shot learning/Image Generation/FID VoxCeleb2 - 1-shot learning/Image Generation/LPIPS VoxCeleb2 - 1-shot learning/Image Generation/Normalized Pose Error VoxCeleb2 - 1-shot learning/Image Generation/SSIM VoxCeleb2 - 1-shot learning/Image Generation/inference time (ms)VoxCeleb2 - 1-shot learning/Talking Head Generation/CSIM VoxCeleb2 - 1-shot learning/Talking Head Generation/FID VoxCeleb2 - 1-shot learning/Talking Head Generation/LPIPS VoxCeleb2 - 1-shot learning/Talking Head Generation/Normalized Pose Error VoxCeleb2 - 1-shot learning/Talking Head Generation/SSIM VoxCeleb2 - 1-shot learning/Talking Head Generation/inference time (ms)VoxCeleb2 - 32-shot learning/10-shot image generation/FID VoxCeleb2 - 32-shot learning/3D/FID VoxCeleb2 - 32-shot learning/3D Face Modelling/FID VoxCeleb2 - 32-shot learning/3D Face Reconstruction/FID VoxCeleb2 - 32-shot learning/Face Generation/FID VoxCeleb2 - 32-shot learning/Face Reconstruction/FID VoxCeleb2 - 32-shot learning/Facial Recognition and Modelling/FID VoxCeleb2 - 32-shot learning/Image Generation/FID VoxCeleb2 - 32-shot learning/Talking Head Generation/FID VoxCeleb2 - 8-shot learning/10-shot image generation/FID VoxCeleb2 - 8-shot learning/3D/FID VoxCeleb2 - 8-shot learning/3D Face Modelling/FID VoxCeleb2 - 8-shot learning/3D Face Reconstruction/FID VoxCeleb2 - 8-shot learning/Face Generation/FID VoxCeleb2 - 8-shot learning/Face Reconstruction/FID VoxCeleb2 - 8-shot learning/Facial Recognition and Modelling/FID VoxCeleb2 - 8-shot learning/Image Generation/FID VoxCeleb2 - 8-shot learning/Talking Head Generation/FID

Statistics

Papers: 564
Benchmarks: 3

Links

Tasks

Speaker Verification Speech Separation Talking Head Generation