TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/A Lip Sync Expert Is All You Need for Speech to Lip Genera...

A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild

K R Prajwal, Rudrabha Mukhopadhyay, Vinay Namboodiri, C. V. Jawahar

2020-08-23Talking Head GenerationAllTalking Face GenerationMORPHUnconstrained Lip-synchronization
PaperPDFCodeCodeCode(official)Code

Abstract

In this work, we investigate the problem of lip-syncing a talking face video of an arbitrary identity to match a target speech segment. Current works excel at producing accurate lip movements on a static image or videos of specific people seen during the training phase. However, they fail to accurately morph the lip movements of arbitrary identities in dynamic, unconstrained talking face videos, resulting in significant parts of the video being out-of-sync with the new audio. We identify key reasons pertaining to this and hence resolve them by learning from a powerful lip-sync discriminator. Next, we propose new, rigorous evaluation benchmarks and metrics to accurately measure lip synchronization in unconstrained videos. Extensive quantitative evaluations on our challenging benchmarks show that the lip-sync accuracy of the videos generated by our Wav2Lip model is almost as good as real synced videos. We provide a demo video clearly showing the substantial impact of our Wav2Lip model and evaluation benchmarks on our website: \url{cvit.iiit.ac.in/research/projects/cvit-projects/a-lip-sync-expert-is-all-you-need-for-speech-to-lip-generation-in-the-wild}. The code and models are released at this GitHub repository: \url{github.com/Rudrabha/Wav2Lip}. You can also try out the interactive demo at this link: \url{bhaasha.iiit.ac.in/lipsync}.

Results

TaskDatasetMetricValueModel
Facial Recognition and ModellingLRS2FID4.446Wav2Lip + GAN
Facial Recognition and ModellingLRS2LSE-D6.469Wav2Lip + GAN
Facial Recognition and ModellingLRS2FID4.887Wav2Lip
Facial Recognition and ModellingLRS2LSE-C7.781Wav2Lip
Facial Recognition and ModellingLRS2LSE-D6.386Wav2Lip
Facial Recognition and ModellingLRS3FID4.35Wav2Lip + GAN
Facial Recognition and ModellingLRS3LSE-C7.574Wav2Lip + GAN
Facial Recognition and ModellingLRS3LSE-D6.986Wav2Lip + GAN
Facial Recognition and ModellingLRS3FID4.844Wav2Lip
Facial Recognition and ModellingLRS3LSE-C7.887Wav2Lip
Facial Recognition and ModellingLRS3LSE-D6.652Wav2Lip
Facial Recognition and ModellingLRWFID2.475Wav2Lip + GAN
Facial Recognition and ModellingLRWLSE-C7.263Wav2Lip + GAN
Facial Recognition and ModellingLRWLSE-D6.774Wav2Lip + GAN
Facial Recognition and ModellingLRWFID3.189Wav2Lip
Facial Recognition and ModellingLRWLSE-C7.49Wav2Lip
Facial Recognition and ModellingLRWLSE-D6.512Wav2Lip
Image GenerationLRS2FID4.446Wav2Lip + GAN
Image GenerationLRS2LSE-D6.469Wav2Lip + GAN
Image GenerationLRS2FID4.887Wav2Lip
Image GenerationLRS2LSE-C7.781Wav2Lip
Image GenerationLRS2LSE-D6.386Wav2Lip
Image GenerationLRS3FID4.35Wav2Lip + GAN
Image GenerationLRS3LSE-C7.574Wav2Lip + GAN
Image GenerationLRS3LSE-D6.986Wav2Lip + GAN
Image GenerationLRS3FID4.844Wav2Lip
Image GenerationLRS3LSE-C7.887Wav2Lip
Image GenerationLRS3LSE-D6.652Wav2Lip
Image GenerationLRWFID2.475Wav2Lip + GAN
Image GenerationLRWLSE-C7.263Wav2Lip + GAN
Image GenerationLRWLSE-D6.774Wav2Lip + GAN
Image GenerationLRWFID3.189Wav2Lip
Image GenerationLRWLSE-C7.49Wav2Lip
Image GenerationLRWLSE-D6.512Wav2Lip
Talking Head GenerationLRS2FID4.446Wav2Lip + GAN
Talking Head GenerationLRS2LSE-D6.469Wav2Lip + GAN
Talking Head GenerationLRS2FID4.887Wav2Lip
Talking Head GenerationLRS2LSE-C7.781Wav2Lip
Talking Head GenerationLRS2LSE-D6.386Wav2Lip
Talking Head GenerationLRS3FID4.35Wav2Lip + GAN
Talking Head GenerationLRS3LSE-C7.574Wav2Lip + GAN
Talking Head GenerationLRS3LSE-D6.986Wav2Lip + GAN
Talking Head GenerationLRS3FID4.844Wav2Lip
Talking Head GenerationLRS3LSE-C7.887Wav2Lip
Talking Head GenerationLRS3LSE-D6.652Wav2Lip
Talking Head GenerationLRWFID2.475Wav2Lip + GAN
Talking Head GenerationLRWLSE-C7.263Wav2Lip + GAN
Talking Head GenerationLRWLSE-D6.774Wav2Lip + GAN
Talking Head GenerationLRWFID3.189Wav2Lip
Talking Head GenerationLRWLSE-C7.49Wav2Lip
Talking Head GenerationLRWLSE-D6.512Wav2Lip
Face GenerationLRS2FID4.446Wav2Lip + GAN
Face GenerationLRS2LSE-D6.469Wav2Lip + GAN
Face GenerationLRS2FID4.887Wav2Lip
Face GenerationLRS2LSE-C7.781Wav2Lip
Face GenerationLRS2LSE-D6.386Wav2Lip
Face GenerationLRS3FID4.35Wav2Lip + GAN
Face GenerationLRS3LSE-C7.574Wav2Lip + GAN
Face GenerationLRS3LSE-D6.986Wav2Lip + GAN
Face GenerationLRS3FID4.844Wav2Lip
Face GenerationLRS3LSE-C7.887Wav2Lip
Face GenerationLRS3LSE-D6.652Wav2Lip
Face GenerationLRWFID2.475Wav2Lip + GAN
Face GenerationLRWLSE-C7.263Wav2Lip + GAN
Face GenerationLRWLSE-D6.774Wav2Lip + GAN
Face GenerationLRWFID3.189Wav2Lip
Face GenerationLRWLSE-C7.49Wav2Lip
Face GenerationLRWLSE-D6.512Wav2Lip
Face ReconstructionLRS2FID4.446Wav2Lip + GAN
Face ReconstructionLRS2LSE-D6.469Wav2Lip + GAN
Face ReconstructionLRS2FID4.887Wav2Lip
Face ReconstructionLRS2LSE-C7.781Wav2Lip
Face ReconstructionLRS2LSE-D6.386Wav2Lip
Face ReconstructionLRS3FID4.35Wav2Lip + GAN
Face ReconstructionLRS3LSE-C7.574Wav2Lip + GAN
Face ReconstructionLRS3LSE-D6.986Wav2Lip + GAN
Face ReconstructionLRS3FID4.844Wav2Lip
Face ReconstructionLRS3LSE-C7.887Wav2Lip
Face ReconstructionLRS3LSE-D6.652Wav2Lip
Face ReconstructionLRWFID2.475Wav2Lip + GAN
Face ReconstructionLRWLSE-C7.263Wav2Lip + GAN
Face ReconstructionLRWLSE-D6.774Wav2Lip + GAN
Face ReconstructionLRWFID3.189Wav2Lip
Face ReconstructionLRWLSE-C7.49Wav2Lip
Face ReconstructionLRWLSE-D6.512Wav2Lip
3DLRS2FID4.446Wav2Lip + GAN
3DLRS2LSE-D6.469Wav2Lip + GAN
3DLRS2FID4.887Wav2Lip
3DLRS2LSE-C7.781Wav2Lip
3DLRS2LSE-D6.386Wav2Lip
3DLRS3FID4.35Wav2Lip + GAN
3DLRS3LSE-C7.574Wav2Lip + GAN
3DLRS3LSE-D6.986Wav2Lip + GAN
3DLRS3FID4.844Wav2Lip
3DLRS3LSE-C7.887Wav2Lip
3DLRS3LSE-D6.652Wav2Lip
3DLRWFID2.475Wav2Lip + GAN
3DLRWLSE-C7.263Wav2Lip + GAN
3DLRWLSE-D6.774Wav2Lip + GAN
3DLRWFID3.189Wav2Lip
3DLRWLSE-C7.49Wav2Lip
3DLRWLSE-D6.512Wav2Lip
3D Face ModellingLRS2FID4.446Wav2Lip + GAN
3D Face ModellingLRS2LSE-D6.469Wav2Lip + GAN
3D Face ModellingLRS2FID4.887Wav2Lip
3D Face ModellingLRS2LSE-C7.781Wav2Lip
3D Face ModellingLRS2LSE-D6.386Wav2Lip
3D Face ModellingLRS3FID4.35Wav2Lip + GAN
3D Face ModellingLRS3LSE-C7.574Wav2Lip + GAN
3D Face ModellingLRS3LSE-D6.986Wav2Lip + GAN
3D Face ModellingLRS3FID4.844Wav2Lip
3D Face ModellingLRS3LSE-C7.887Wav2Lip
3D Face ModellingLRS3LSE-D6.652Wav2Lip
3D Face ModellingLRWFID2.475Wav2Lip + GAN
3D Face ModellingLRWLSE-C7.263Wav2Lip + GAN
3D Face ModellingLRWLSE-D6.774Wav2Lip + GAN
3D Face ModellingLRWFID3.189Wav2Lip
3D Face ModellingLRWLSE-C7.49Wav2Lip
3D Face ModellingLRWLSE-D6.512Wav2Lip
3D Face ReconstructionLRS2FID4.446Wav2Lip + GAN
3D Face ReconstructionLRS2LSE-D6.469Wav2Lip + GAN
3D Face ReconstructionLRS2FID4.887Wav2Lip
3D Face ReconstructionLRS2LSE-C7.781Wav2Lip
3D Face ReconstructionLRS2LSE-D6.386Wav2Lip
3D Face ReconstructionLRS3FID4.35Wav2Lip + GAN
3D Face ReconstructionLRS3LSE-C7.574Wav2Lip + GAN
3D Face ReconstructionLRS3LSE-D6.986Wav2Lip + GAN
3D Face ReconstructionLRS3FID4.844Wav2Lip
3D Face ReconstructionLRS3LSE-C7.887Wav2Lip
3D Face ReconstructionLRS3LSE-D6.652Wav2Lip
3D Face ReconstructionLRWFID2.475Wav2Lip + GAN
3D Face ReconstructionLRWLSE-C7.263Wav2Lip + GAN
3D Face ReconstructionLRWLSE-D6.774Wav2Lip + GAN
3D Face ReconstructionLRWFID3.189Wav2Lip
3D Face ReconstructionLRWLSE-C7.49Wav2Lip
3D Face ReconstructionLRWLSE-D6.512Wav2Lip
10-shot image generationLRS2FID4.446Wav2Lip + GAN
10-shot image generationLRS2LSE-D6.469Wav2Lip + GAN
10-shot image generationLRS2FID4.887Wav2Lip
10-shot image generationLRS2LSE-C7.781Wav2Lip
10-shot image generationLRS2LSE-D6.386Wav2Lip
10-shot image generationLRS3FID4.35Wav2Lip + GAN
10-shot image generationLRS3LSE-C7.574Wav2Lip + GAN
10-shot image generationLRS3LSE-D6.986Wav2Lip + GAN
10-shot image generationLRS3FID4.844Wav2Lip
10-shot image generationLRS3LSE-C7.887Wav2Lip
10-shot image generationLRS3LSE-D6.652Wav2Lip
10-shot image generationLRWFID2.475Wav2Lip + GAN
10-shot image generationLRWLSE-C7.263Wav2Lip + GAN
10-shot image generationLRWLSE-D6.774Wav2Lip + GAN
10-shot image generationLRWFID3.189Wav2Lip
10-shot image generationLRWLSE-C7.49Wav2Lip
10-shot image generationLRWLSE-D6.512Wav2Lip

Related Papers

Modeling Code: Is Text All You Need?2025-07-15All Eyes, no IMU: Learning Flight Attitude from Vision Alone2025-07-15MEDTalk: Multimodal Controlled 3D Facial Animation with Dynamic Emotions by Disentangled Embedding2025-07-08Is Diversity All You Need for Scalable Robotic Manipulation?2025-07-08DESIGN AND IMPLEMENTATION OF ONLINE CLEARANCE REPORT.2025-07-07Is Reasoning All You Need? Probing Bias in the Age of Reasoning Language Models2025-07-03Prompt2SegCXR:Prompt to Segment All Organs and Diseases in Chest X-rays2025-07-01State and Memory is All You Need for Robust and Reliable AI Agents2025-06-30