TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets/LRS2

LRS2

Lip Reading Sentences 2

AudioTextsVideosCustom (non-commercial)Introduced 2017-01-01

The Oxford-BBC Lip Reading Sentences 2 (LRS2) dataset is one of the largest publicly available datasets for lip reading sentences in-the-wild. The database consists of mainly news and talk shows from BBC programs. Each sentence is up to 100 characters in length. The training, validation and test sets are divided according to broadcast date. It is a challenging set since it contains thousands of speakers without speaker labels and large variation in head pose. The pre-training set contains 96,318 utterances, the training set contains 45,839 utterances, the validation set contains 1,082 utterances and the test set contains 1,242 utterances.

Source: Audio-visual Recognition of Overlapped speech for the LRS2 dataset Image Source: https://www.robots.ox.ac.uk/~vgg/data/lip_reading/lrs2.html

Benchmarks

10-shot image generation/FID10-shot image generation/LSE-D10-shot image generation/LSE-C3D/FID3D/LSE-D3D/LSE-C3D Face Modelling/FID3D Face Modelling/LSE-D3D Face Modelling/LSE-C3D Face Reconstruction/FID3D Face Reconstruction/LSE-D3D Face Reconstruction/LSE-CAudio-Visual Speech Recognition/Test WERAutomatic Speech Recognition (ASR)/Test WERFace Generation/FIDFace Generation/LSE-DFace Generation/LSE-CFace Reconstruction/FIDFace Reconstruction/LSE-DFace Reconstruction/LSE-CFacial Recognition and Modelling/FIDFacial Recognition and Modelling/LSE-DFacial Recognition and Modelling/LSE-CImage Generation/LPIPS (S1)Image Generation/LPIPS (S2)Image Generation/LPIPS (S3)Image Generation/LPIPS (S4)Image Generation/LPIPS (S5)Image Generation/SIFID (S1)Image Generation/SIFID (S2)Image Generation/SIFID (S3)Image Generation/SIFID (S4)Image Generation/SIFID (S5)Image Generation/FIDImage Generation/LSE-DImage Generation/LSE-CImage Manipulation/LPIPS (S1)Image Manipulation/LPIPS (S2)Image Manipulation/LPIPS (S3)Image Manipulation/LPIPS (S4)Image Manipulation/LPIPS (S5)Image Manipulation/SIFID (S1)Image Manipulation/SIFID (S2)Image Manipulation/SIFID (S3)Image Manipulation/SIFID (S4)Image Manipulation/SIFID (S5)Keyword Spotting/Top-1 AccuracyKeyword Spotting/Top-5 AccuracyKeyword Spotting/mAPKeyword Spotting/mAP IOU@0.5Lipreading/Word Error Rate (WER)Natural Language Transduction/Word Error Rate (WER)Speech Recognition/Word Error Rate (WER)Speech Recognition/Test WERSpeech Separation/SI-SNRiSpeech Separation/SDRiSpeech Separation/PESQSpeech Separation/STOITalking Head Generation/FIDTalking Head Generation/LSE-DTalking Head Generation/LSE-CVisual Speech Recognition/Word Error Rate (WER)

Related Benchmarks

LRS2+VGGSound/Speech Denoising/CBAKLRS2+VGGSound/Speech Denoising/COVLLRS2+VGGSound/Speech Denoising/CSIGLRS2+VGGSound/Speech Denoising/PESQLRS2+VGGSound/Speech Denoising/STOI

Statistics

Papers
115
Benchmarks
62

Links

Homepage

Tasks

10-shot image generation3D3D Face Modelling3D Face ReconstructionAudio-Visual Speech RecognitionAutomatic Speech Recognition (ASR)Face GenerationFace ReconstructionFacial Recognition and ModellingImage GenerationImage ManipulationKeyword SpottingLandmark-based LipreadingLipreadingNatural Language TransductionSpeech RecognitionSpeech SeparationTalking Head GenerationUnconstrained Lip-synchronizationVisual Keyword SpottingVisual Speech Recognition