A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild

K R Prajwal, Rudrabha Mukhopadhyay, Vinay Namboodiri, C. V. Jawahar

2020-08-23Talking Head Generation All Talking Face Generation MORPH Unconstrained Lip-synchronization

Abstract

In this work, we investigate the problem of lip-syncing a talking face video of an arbitrary identity to match a target speech segment. Current works excel at producing accurate lip movements on a static image or videos of specific people seen during the training phase. However, they fail to accurately morph the lip movements of arbitrary identities in dynamic, unconstrained talking face videos, resulting in significant parts of the video being out-of-sync with the new audio. We identify key reasons pertaining to this and hence resolve them by learning from a powerful lip-sync discriminator. Next, we propose new, rigorous evaluation benchmarks and metrics to accurately measure lip synchronization in unconstrained videos. Extensive quantitative evaluations on our challenging benchmarks show that the lip-sync accuracy of the videos generated by our Wav2Lip model is almost as good as real synced videos. We provide a demo video clearly showing the substantial impact of our Wav2Lip model and evaluation benchmarks on our website: \url{cvit.iiit.ac.in/research/projects/cvit-projects/a-lip-sync-expert-is-all-you-need-for-speech-to-lip-generation-in-the-wild}. The code and models are released at this GitHub repository: \url{github.com/Rudrabha/Wav2Lip}. You can also try out the interactive demo at this link: \url{bhaasha.iiit.ac.in/lipsync}.

Results

Task	Dataset	Metric	Value	Model
Facial Recognition and Modelling	LRS2	FID	4.446	Wav2Lip + GAN
Facial Recognition and Modelling	LRS2	LSE-D	6.469	Wav2Lip + GAN
Facial Recognition and Modelling	LRS2	FID	4.887	Wav2Lip
Facial Recognition and Modelling	LRS2	LSE-C	7.781	Wav2Lip
Facial Recognition and Modelling	LRS2	LSE-D	6.386	Wav2Lip
Facial Recognition and Modelling	LRS3	FID	4.35	Wav2Lip + GAN
Facial Recognition and Modelling	LRS3	LSE-C	7.574	Wav2Lip + GAN
Facial Recognition and Modelling	LRS3	LSE-D	6.986	Wav2Lip + GAN
Facial Recognition and Modelling	LRS3	FID	4.844	Wav2Lip
Facial Recognition and Modelling	LRS3	LSE-C	7.887	Wav2Lip
Facial Recognition and Modelling	LRS3	LSE-D	6.652	Wav2Lip
Facial Recognition and Modelling	LRW	FID	2.475	Wav2Lip + GAN
Facial Recognition and Modelling	LRW	LSE-C	7.263	Wav2Lip + GAN
Facial Recognition and Modelling	LRW	LSE-D	6.774	Wav2Lip + GAN
Facial Recognition and Modelling	LRW	FID	3.189	Wav2Lip
Facial Recognition and Modelling	LRW	LSE-C	7.49	Wav2Lip
Facial Recognition and Modelling	LRW	LSE-D	6.512	Wav2Lip
Image Generation	LRS2	FID	4.446	Wav2Lip + GAN
Image Generation	LRS2	LSE-D	6.469	Wav2Lip + GAN
Image Generation	LRS2	FID	4.887	Wav2Lip
Image Generation	LRS2	LSE-C	7.781	Wav2Lip
Image Generation	LRS2	LSE-D	6.386	Wav2Lip
Image Generation	LRS3	FID	4.35	Wav2Lip + GAN
Image Generation	LRS3	LSE-C	7.574	Wav2Lip + GAN
Image Generation	LRS3	LSE-D	6.986	Wav2Lip + GAN
Image Generation	LRS3	FID	4.844	Wav2Lip
Image Generation	LRS3	LSE-C	7.887	Wav2Lip
Image Generation	LRS3	LSE-D	6.652	Wav2Lip
Image Generation	LRW	FID	2.475	Wav2Lip + GAN
Image Generation	LRW	LSE-C	7.263	Wav2Lip + GAN
Image Generation	LRW	LSE-D	6.774	Wav2Lip + GAN
Image Generation	LRW	FID	3.189	Wav2Lip
Image Generation	LRW	LSE-C	7.49	Wav2Lip
Image Generation	LRW	LSE-D	6.512	Wav2Lip
Talking Head Generation	LRS2	FID	4.446	Wav2Lip + GAN
Talking Head Generation	LRS2	LSE-D	6.469	Wav2Lip + GAN
Talking Head Generation	LRS2	FID	4.887	Wav2Lip
Talking Head Generation	LRS2	LSE-C	7.781	Wav2Lip
Talking Head Generation	LRS2	LSE-D	6.386	Wav2Lip
Talking Head Generation	LRS3	FID	4.35	Wav2Lip + GAN
Talking Head Generation	LRS3	LSE-C	7.574	Wav2Lip + GAN
Talking Head Generation	LRS3	LSE-D	6.986	Wav2Lip + GAN
Talking Head Generation	LRS3	FID	4.844	Wav2Lip
Talking Head Generation	LRS3	LSE-C	7.887	Wav2Lip
Talking Head Generation	LRS3	LSE-D	6.652	Wav2Lip
Talking Head Generation	LRW	FID	2.475	Wav2Lip + GAN
Talking Head Generation	LRW	LSE-C	7.263	Wav2Lip + GAN
Talking Head Generation	LRW	LSE-D	6.774	Wav2Lip + GAN
Talking Head Generation	LRW	FID	3.189	Wav2Lip
Talking Head Generation	LRW	LSE-C	7.49	Wav2Lip
Talking Head Generation	LRW	LSE-D	6.512	Wav2Lip
Face Generation	LRS2	FID	4.446	Wav2Lip + GAN
Face Generation	LRS2	LSE-D	6.469	Wav2Lip + GAN
Face Generation	LRS2	FID	4.887	Wav2Lip
Face Generation	LRS2	LSE-C	7.781	Wav2Lip
Face Generation	LRS2	LSE-D	6.386	Wav2Lip
Face Generation	LRS3	FID	4.35	Wav2Lip + GAN
Face Generation	LRS3	LSE-C	7.574	Wav2Lip + GAN
Face Generation	LRS3	LSE-D	6.986	Wav2Lip + GAN
Face Generation	LRS3	FID	4.844	Wav2Lip
Face Generation	LRS3	LSE-C	7.887	Wav2Lip
Face Generation	LRS3	LSE-D	6.652	Wav2Lip
Face Generation	LRW	FID	2.475	Wav2Lip + GAN
Face Generation	LRW	LSE-C	7.263	Wav2Lip + GAN
Face Generation	LRW	LSE-D	6.774	Wav2Lip + GAN
Face Generation	LRW	FID	3.189	Wav2Lip
Face Generation	LRW	LSE-C	7.49	Wav2Lip
Face Generation	LRW	LSE-D	6.512	Wav2Lip
Face Reconstruction	LRS2	FID	4.446	Wav2Lip + GAN
Face Reconstruction	LRS2	LSE-D	6.469	Wav2Lip + GAN
Face Reconstruction	LRS2	FID	4.887	Wav2Lip
Face Reconstruction	LRS2	LSE-C	7.781	Wav2Lip
Face Reconstruction	LRS2	LSE-D	6.386	Wav2Lip
Face Reconstruction	LRS3	FID	4.35	Wav2Lip + GAN
Face Reconstruction	LRS3	LSE-C	7.574	Wav2Lip + GAN
Face Reconstruction	LRS3	LSE-D	6.986	Wav2Lip + GAN
Face Reconstruction	LRS3	FID	4.844	Wav2Lip
Face Reconstruction	LRS3	LSE-C	7.887	Wav2Lip
Face Reconstruction	LRS3	LSE-D	6.652	Wav2Lip
Face Reconstruction	LRW	FID	2.475	Wav2Lip + GAN
Face Reconstruction	LRW	LSE-C	7.263	Wav2Lip + GAN
Face Reconstruction	LRW	LSE-D	6.774	Wav2Lip + GAN
Face Reconstruction	LRW	FID	3.189	Wav2Lip
Face Reconstruction	LRW	LSE-C	7.49	Wav2Lip
Face Reconstruction	LRW	LSE-D	6.512	Wav2Lip
3D	LRS2	FID	4.446	Wav2Lip + GAN
3D	LRS2	LSE-D	6.469	Wav2Lip + GAN
3D	LRS2	FID	4.887	Wav2Lip
3D	LRS2	LSE-C	7.781	Wav2Lip
3D	LRS2	LSE-D	6.386	Wav2Lip
3D	LRS3	FID	4.35	Wav2Lip + GAN
3D	LRS3	LSE-C	7.574	Wav2Lip + GAN
3D	LRS3	LSE-D	6.986	Wav2Lip + GAN
3D	LRS3	FID	4.844	Wav2Lip
3D	LRS3	LSE-C	7.887	Wav2Lip
3D	LRS3	LSE-D	6.652	Wav2Lip
3D	LRW	FID	2.475	Wav2Lip + GAN
3D	LRW	LSE-C	7.263	Wav2Lip + GAN
3D	LRW	LSE-D	6.774	Wav2Lip + GAN
3D	LRW	FID	3.189	Wav2Lip
3D	LRW	LSE-C	7.49	Wav2Lip
3D	LRW	LSE-D	6.512	Wav2Lip
3D Face Modelling	LRS2	FID	4.446	Wav2Lip + GAN
3D Face Modelling	LRS2	LSE-D	6.469	Wav2Lip + GAN
3D Face Modelling	LRS2	FID	4.887	Wav2Lip
3D Face Modelling	LRS2	LSE-C	7.781	Wav2Lip
3D Face Modelling	LRS2	LSE-D	6.386	Wav2Lip
3D Face Modelling	LRS3	FID	4.35	Wav2Lip + GAN
3D Face Modelling	LRS3	LSE-C	7.574	Wav2Lip + GAN
3D Face Modelling	LRS3	LSE-D	6.986	Wav2Lip + GAN
3D Face Modelling	LRS3	FID	4.844	Wav2Lip
3D Face Modelling	LRS3	LSE-C	7.887	Wav2Lip
3D Face Modelling	LRS3	LSE-D	6.652	Wav2Lip
3D Face Modelling	LRW	FID	2.475	Wav2Lip + GAN
3D Face Modelling	LRW	LSE-C	7.263	Wav2Lip + GAN
3D Face Modelling	LRW	LSE-D	6.774	Wav2Lip + GAN
3D Face Modelling	LRW	FID	3.189	Wav2Lip
3D Face Modelling	LRW	LSE-C	7.49	Wav2Lip
3D Face Modelling	LRW	LSE-D	6.512	Wav2Lip
3D Face Reconstruction	LRS2	FID	4.446	Wav2Lip + GAN
3D Face Reconstruction	LRS2	LSE-D	6.469	Wav2Lip + GAN
3D Face Reconstruction	LRS2	FID	4.887	Wav2Lip
3D Face Reconstruction	LRS2	LSE-C	7.781	Wav2Lip
3D Face Reconstruction	LRS2	LSE-D	6.386	Wav2Lip
3D Face Reconstruction	LRS3	FID	4.35	Wav2Lip + GAN
3D Face Reconstruction	LRS3	LSE-C	7.574	Wav2Lip + GAN
3D Face Reconstruction	LRS3	LSE-D	6.986	Wav2Lip + GAN
3D Face Reconstruction	LRS3	FID	4.844	Wav2Lip
3D Face Reconstruction	LRS3	LSE-C	7.887	Wav2Lip
3D Face Reconstruction	LRS3	LSE-D	6.652	Wav2Lip
3D Face Reconstruction	LRW	FID	2.475	Wav2Lip + GAN
3D Face Reconstruction	LRW	LSE-C	7.263	Wav2Lip + GAN
3D Face Reconstruction	LRW	LSE-D	6.774	Wav2Lip + GAN
3D Face Reconstruction	LRW	FID	3.189	Wav2Lip
3D Face Reconstruction	LRW	LSE-C	7.49	Wav2Lip
3D Face Reconstruction	LRW	LSE-D	6.512	Wav2Lip
10-shot image generation	LRS2	FID	4.446	Wav2Lip + GAN
10-shot image generation	LRS2	LSE-D	6.469	Wav2Lip + GAN
10-shot image generation	LRS2	FID	4.887	Wav2Lip
10-shot image generation	LRS2	LSE-C	7.781	Wav2Lip
10-shot image generation	LRS2	LSE-D	6.386	Wav2Lip
10-shot image generation	LRS3	FID	4.35	Wav2Lip + GAN
10-shot image generation	LRS3	LSE-C	7.574	Wav2Lip + GAN
10-shot image generation	LRS3	LSE-D	6.986	Wav2Lip + GAN
10-shot image generation	LRS3	FID	4.844	Wav2Lip
10-shot image generation	LRS3	LSE-C	7.887	Wav2Lip
10-shot image generation	LRS3	LSE-D	6.652	Wav2Lip
10-shot image generation	LRW	FID	2.475	Wav2Lip + GAN
10-shot image generation	LRW	LSE-C	7.263	Wav2Lip + GAN
10-shot image generation	LRW	LSE-D	6.774	Wav2Lip + GAN
10-shot image generation	LRW	FID	3.189	Wav2Lip
10-shot image generation	LRW	LSE-C	7.49	Wav2Lip
10-shot image generation	LRW	LSE-D	6.512	Wav2Lip

Abstract

Results

Task	Dataset	Metric	Value	Model
Facial Recognition and Modelling	LRS2	FID	4.446	Wav2Lip + GAN
Facial Recognition and Modelling	LRS2	LSE-D	6.469	Wav2Lip + GAN
Facial Recognition and Modelling	LRS2	FID	4.887	Wav2Lip
Facial Recognition and Modelling	LRS2	LSE-C	7.781	Wav2Lip
Facial Recognition and Modelling	LRS2	LSE-D	6.386	Wav2Lip
Facial Recognition and Modelling	LRS3	FID	4.35	Wav2Lip + GAN
Facial Recognition and Modelling	LRS3	LSE-C	7.574	Wav2Lip + GAN
Facial Recognition and Modelling	LRS3	LSE-D	6.986	Wav2Lip + GAN
Facial Recognition and Modelling	LRS3	FID	4.844	Wav2Lip
Facial Recognition and Modelling	LRS3	LSE-C	7.887	Wav2Lip
Facial Recognition and Modelling	LRS3	LSE-D	6.652	Wav2Lip
Facial Recognition and Modelling	LRW	FID	2.475	Wav2Lip + GAN
Facial Recognition and Modelling	LRW	LSE-C	7.263	Wav2Lip + GAN
Facial Recognition and Modelling	LRW	LSE-D	6.774	Wav2Lip + GAN
Facial Recognition and Modelling	LRW	FID	3.189	Wav2Lip
Facial Recognition and Modelling	LRW	LSE-C	7.49	Wav2Lip
Facial Recognition and Modelling	LRW	LSE-D	6.512	Wav2Lip
Image Generation	LRS2	FID	4.446	Wav2Lip + GAN
Image Generation	LRS2	LSE-D	6.469	Wav2Lip + GAN
Image Generation	LRS2	FID	4.887	Wav2Lip
Image Generation	LRS2	LSE-C	7.781	Wav2Lip
Image Generation	LRS2	LSE-D	6.386	Wav2Lip
Image Generation	LRS3	FID	4.35	Wav2Lip + GAN
Image Generation	LRS3	LSE-C	7.574	Wav2Lip + GAN
Image Generation	LRS3	LSE-D	6.986	Wav2Lip + GAN
Image Generation	LRS3	FID	4.844	Wav2Lip
Image Generation	LRS3	LSE-C	7.887	Wav2Lip
Image Generation	LRS3	LSE-D	6.652	Wav2Lip
Image Generation	LRW	FID	2.475	Wav2Lip + GAN
Image Generation	LRW	LSE-C	7.263	Wav2Lip + GAN
Image Generation	LRW	LSE-D	6.774	Wav2Lip + GAN
Image Generation	LRW	FID	3.189	Wav2Lip
Image Generation	LRW	LSE-C	7.49	Wav2Lip
Image Generation	LRW	LSE-D	6.512	Wav2Lip
Talking Head Generation	LRS2	FID	4.446	Wav2Lip + GAN
Talking Head Generation	LRS2	LSE-D	6.469	Wav2Lip + GAN
Talking Head Generation	LRS2	FID	4.887	Wav2Lip
Talking Head Generation	LRS2	LSE-C	7.781	Wav2Lip
Talking Head Generation	LRS2	LSE-D	6.386	Wav2Lip
Talking Head Generation	LRS3	FID	4.35	Wav2Lip + GAN
Talking Head Generation	LRS3	LSE-C	7.574	Wav2Lip + GAN
Talking Head Generation	LRS3	LSE-D	6.986	Wav2Lip + GAN
Talking Head Generation	LRS3	FID	4.844	Wav2Lip
Talking Head Generation	LRS3	LSE-C	7.887	Wav2Lip
Talking Head Generation	LRS3	LSE-D	6.652	Wav2Lip
Talking Head Generation	LRW	FID	2.475	Wav2Lip + GAN
Talking Head Generation	LRW	LSE-C	7.263	Wav2Lip + GAN
Talking Head Generation	LRW	LSE-D	6.774	Wav2Lip + GAN
Talking Head Generation	LRW	FID	3.189	Wav2Lip
Talking Head Generation	LRW	LSE-C	7.49	Wav2Lip
Talking Head Generation	LRW	LSE-D	6.512	Wav2Lip
Face Generation	LRS2	FID	4.446	Wav2Lip + GAN
Face Generation	LRS2	LSE-D	6.469	Wav2Lip + GAN
Face Generation	LRS2	FID	4.887	Wav2Lip
Face Generation	LRS2	LSE-C	7.781	Wav2Lip
Face Generation	LRS2	LSE-D	6.386	Wav2Lip
Face Generation	LRS3	FID	4.35	Wav2Lip + GAN
Face Generation	LRS3	LSE-C	7.574	Wav2Lip + GAN
Face Generation	LRS3	LSE-D	6.986	Wav2Lip + GAN
Face Generation	LRS3	FID	4.844	Wav2Lip
Face Generation	LRS3	LSE-C	7.887	Wav2Lip
Face Generation	LRS3	LSE-D	6.652	Wav2Lip
Face Generation	LRW	FID	2.475	Wav2Lip + GAN
Face Generation	LRW	LSE-C	7.263	Wav2Lip + GAN
Face Generation	LRW	LSE-D	6.774	Wav2Lip + GAN
Face Generation	LRW	FID	3.189	Wav2Lip
Face Generation	LRW	LSE-C	7.49	Wav2Lip
Face Generation	LRW	LSE-D	6.512	Wav2Lip
Face Reconstruction	LRS2	FID	4.446	Wav2Lip + GAN
Face Reconstruction	LRS2	LSE-D	6.469	Wav2Lip + GAN
Face Reconstruction	LRS2	FID	4.887	Wav2Lip
Face Reconstruction	LRS2	LSE-C	7.781	Wav2Lip
Face Reconstruction	LRS2	LSE-D	6.386	Wav2Lip
Face Reconstruction	LRS3	FID	4.35	Wav2Lip + GAN
Face Reconstruction	LRS3	LSE-C	7.574	Wav2Lip + GAN
Face Reconstruction	LRS3	LSE-D	6.986	Wav2Lip + GAN
Face Reconstruction	LRS3	FID	4.844	Wav2Lip
Face Reconstruction	LRS3	LSE-C	7.887	Wav2Lip
Face Reconstruction	LRS3	LSE-D	6.652	Wav2Lip
Face Reconstruction	LRW	FID	2.475	Wav2Lip + GAN
Face Reconstruction	LRW	LSE-C	7.263	Wav2Lip + GAN
Face Reconstruction	LRW	LSE-D	6.774	Wav2Lip + GAN
Face Reconstruction	LRW	FID	3.189	Wav2Lip
Face Reconstruction	LRW	LSE-C	7.49	Wav2Lip
Face Reconstruction	LRW	LSE-D	6.512	Wav2Lip
3D	LRS2	FID	4.446	Wav2Lip + GAN
3D	LRS2	LSE-D	6.469	Wav2Lip + GAN
3D	LRS2	FID	4.887	Wav2Lip
3D	LRS2	LSE-C	7.781	Wav2Lip
3D	LRS2	LSE-D	6.386	Wav2Lip
3D	LRS3	FID	4.35	Wav2Lip + GAN
3D	LRS3	LSE-C	7.574	Wav2Lip + GAN
3D	LRS3	LSE-D	6.986	Wav2Lip + GAN
3D	LRS3	FID	4.844	Wav2Lip
3D	LRS3	LSE-C	7.887	Wav2Lip
3D	LRS3	LSE-D	6.652	Wav2Lip
3D	LRW	FID	2.475	Wav2Lip + GAN
3D	LRW	LSE-C	7.263	Wav2Lip + GAN
3D	LRW	LSE-D	6.774	Wav2Lip + GAN
3D	LRW	FID	3.189	Wav2Lip
3D	LRW	LSE-C	7.49	Wav2Lip
3D	LRW	LSE-D	6.512	Wav2Lip
3D Face Modelling	LRS2	FID	4.446	Wav2Lip + GAN
3D Face Modelling	LRS2	LSE-D	6.469	Wav2Lip + GAN
3D Face Modelling	LRS2	FID	4.887	Wav2Lip
3D Face Modelling	LRS2	LSE-C	7.781	Wav2Lip
3D Face Modelling	LRS2	LSE-D	6.386	Wav2Lip
3D Face Modelling	LRS3	FID	4.35	Wav2Lip + GAN
3D Face Modelling	LRS3	LSE-C	7.574	Wav2Lip + GAN
3D Face Modelling	LRS3	LSE-D	6.986	Wav2Lip + GAN
3D Face Modelling	LRS3	FID	4.844	Wav2Lip
3D Face Modelling	LRS3	LSE-C	7.887	Wav2Lip
3D Face Modelling	LRS3	LSE-D	6.652	Wav2Lip
3D Face Modelling	LRW	FID	2.475	Wav2Lip + GAN
3D Face Modelling	LRW	LSE-C	7.263	Wav2Lip + GAN
3D Face Modelling	LRW	LSE-D	6.774	Wav2Lip + GAN
3D Face Modelling	LRW	FID	3.189	Wav2Lip
3D Face Modelling	LRW	LSE-C	7.49	Wav2Lip
3D Face Modelling	LRW	LSE-D	6.512	Wav2Lip
3D Face Reconstruction	LRS2	FID	4.446	Wav2Lip + GAN
3D Face Reconstruction	LRS2	LSE-D	6.469	Wav2Lip + GAN
3D Face Reconstruction	LRS2	FID	4.887	Wav2Lip
3D Face Reconstruction	LRS2	LSE-C	7.781	Wav2Lip
3D Face Reconstruction	LRS2	LSE-D	6.386	Wav2Lip
3D Face Reconstruction	LRS3	FID	4.35	Wav2Lip + GAN
3D Face Reconstruction	LRS3	LSE-C	7.574	Wav2Lip + GAN
3D Face Reconstruction	LRS3	LSE-D	6.986	Wav2Lip + GAN
3D Face Reconstruction	LRS3	FID	4.844	Wav2Lip
3D Face Reconstruction	LRS3	LSE-C	7.887	Wav2Lip
3D Face Reconstruction	LRS3	LSE-D	6.652	Wav2Lip
3D Face Reconstruction	LRW	FID	2.475	Wav2Lip + GAN
3D Face Reconstruction	LRW	LSE-C	7.263	Wav2Lip + GAN
3D Face Reconstruction	LRW	LSE-D	6.774	Wav2Lip + GAN
3D Face Reconstruction	LRW	FID	3.189	Wav2Lip
3D Face Reconstruction	LRW	LSE-C	7.49	Wav2Lip
3D Face Reconstruction	LRW	LSE-D	6.512	Wav2Lip
10-shot image generation	LRS2	FID	4.446	Wav2Lip + GAN
10-shot image generation	LRS2	LSE-D	6.469	Wav2Lip + GAN
10-shot image generation	LRS2	FID	4.887	Wav2Lip
10-shot image generation	LRS2	LSE-C	7.781	Wav2Lip
10-shot image generation	LRS2	LSE-D	6.386	Wav2Lip
10-shot image generation	LRS3	FID	4.35	Wav2Lip + GAN
10-shot image generation	LRS3	LSE-C	7.574	Wav2Lip + GAN
10-shot image generation	LRS3	LSE-D	6.986	Wav2Lip + GAN
10-shot image generation	LRS3	FID	4.844	Wav2Lip
10-shot image generation	LRS3	LSE-C	7.887	Wav2Lip
10-shot image generation	LRS3	LSE-D	6.652	Wav2Lip
10-shot image generation	LRW	FID	2.475	Wav2Lip + GAN
10-shot image generation	LRW	LSE-C	7.263	Wav2Lip + GAN
10-shot image generation	LRW	LSE-D	6.774	Wav2Lip + GAN
10-shot image generation	LRW	FID	3.189	Wav2Lip
10-shot image generation	LRW	LSE-C	7.49	Wav2Lip
10-shot image generation	LRW	LSE-D	6.512	Wav2Lip

A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild

Abstract

Results

Related Papers

A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild

Abstract

Results

Related Papers