LRS2

Lip Reading Sentences 2

AudioTextsVideosCustom (non-commercial)Introduced 2017-01-01

The Oxford-BBC Lip Reading Sentences 2 (LRS2) dataset is one of the largest publicly available datasets for lip reading sentences in-the-wild. The database consists of mainly news and talk shows from BBC programs. Each sentence is up to 100 characters in length. The training, validation and test sets are divided according to broadcast date. It is a challenging set since it contains thousands of speakers without speaker labels and large variation in head pose. The pre-training set contains 96,318 utterances, the training set contains 45,839 utterances, the validation set contains 1,082 utterances and the test set contains 1,242 utterances.

Source: Audio-visual Recognition of Overlapped speech for the LRS2 dataset Image Source: https://www.robots.ox.ac.uk/~vgg/data/lip_reading/lrs2.html

Benchmarks

10-shot image generation/FID 10-shot image generation/LSE-D 10-shot image generation/LSE-C 3D/FID 3D/LSE-D 3D/LSE-C 3D Face Modelling/FID 3D Face Modelling/LSE-D 3D Face Modelling/LSE-C 3D Face Reconstruction/FID 3D Face Reconstruction/LSE-D 3D Face Reconstruction/LSE-C Audio-Visual Speech Recognition/Test WER Automatic Speech Recognition (ASR)/Test WER Face Generation/FID Face Generation/LSE-D Face Generation/LSE-C Face Reconstruction/FID Face Reconstruction/LSE-D Face Reconstruction/LSE-C Facial Recognition and Modelling/FID Facial Recognition and Modelling/LSE-D Facial Recognition and Modelling/LSE-C Image Generation/LPIPS (S1)Image Generation/LPIPS (S2)Image Generation/LPIPS (S3)Image Generation/LPIPS (S4)Image Generation/LPIPS (S5)Image Generation/SIFID (S1)Image Generation/SIFID (S2)Image Generation/SIFID (S3)Image Generation/SIFID (S4)Image Generation/SIFID (S5)Image Generation/FID Image Generation/LSE-D Image Generation/LSE-C Image Manipulation/LPIPS (S1)Image Manipulation/LPIPS (S2)Image Manipulation/LPIPS (S3)Image Manipulation/LPIPS (S4)Image Manipulation/LPIPS (S5)Image Manipulation/SIFID (S1)Image Manipulation/SIFID (S2)Image Manipulation/SIFID (S3)Image Manipulation/SIFID (S4)Image Manipulation/SIFID (S5)Keyword Spotting/Top-1 Accuracy Keyword Spotting/Top-5 Accuracy Keyword Spotting/mAP Keyword Spotting/mAP IOU@0.5 Lipreading/Word Error Rate (WER)Natural Language Transduction/Word Error Rate (WER)Speech Recognition/Word Error Rate (WER)Speech Recognition/Test WER Speech Separation/SI-SNRi Speech Separation/SDRi Speech Separation/PESQ Speech Separation/STOI Talking Head Generation/FID Talking Head Generation/LSE-D Talking Head Generation/LSE-C Visual Speech Recognition/Word Error Rate (WER)

Related Benchmarks

LRS2+VGGSound/Speech Denoising/CBAK LRS2+VGGSound/Speech Denoising/COVL LRS2+VGGSound/Speech Denoising/CSIG LRS2+VGGSound/Speech Denoising/PESQ LRS2+VGGSound/Speech Denoising/STOI