Towards Automatic Face-to-Face Translation

Prajwal K R, Rudrabha Mukhopadhyay, Jerin Philip, Abhishek Jha, Vinay Namboodiri, C. V. Jawahar

2020-03-01ACM Multimedia, 2019 2019 10Speech-to-Speech Translation Machine Translation Translation Unconstrained Lip-synchronization

Paper PDF Code(official)

Abstract

In light of the recent breakthroughs in automatic machine translation systems, we propose a novel approach that we term as "Face-to-Face Translation". As today's digital communication becomes increasingly visual, we argue that there is a need for systems that can automatically translate a video of a person speaking in language A into a target language B with realistic lip synchronization. In this work, we create an automatic pipeline for this problem and demonstrate its impact on multiple real-world applications. First, we build a working speech-to-speech translation system by bringing together multiple existing modules from speech and language. We then move towards "Face-to-Face Translation" by incorporating a novel visual module, LipGAN for generating realistic talking faces from the translated audio. Quantitative evaluation of LipGAN on the standard LRW test set shows that it significantly outperforms existing approaches across all standard metrics. We also subject our Face-to-Face Translation pipeline, to multiple human evaluations and show that it can significantly improve the overall user experience for consuming and interacting with multimodal content across languages. Code, models and demo video are made publicly available. Demo video: https://www.youtube.com/watch?v=aHG6Oei8jF0 Code and models: https://github.com/Rudrabha/LipGAN

Results

Task	Dataset	Metric	Value	Model
Facial Recognition and Modelling	LRW	LMD	0.6	LipGAN
Facial Recognition and Modelling	LRW	SSIM	0.96	LipGAN
Image Generation	LRW	LMD	0.6	LipGAN
Image Generation	LRW	SSIM	0.96	LipGAN
Face Generation	LRW	LMD	0.6	LipGAN
Face Generation	LRW	SSIM	0.96	LipGAN
Face Reconstruction	LRW	LMD	0.6	LipGAN
Face Reconstruction	LRW	SSIM	0.96	LipGAN
3D	LRW	LMD	0.6	LipGAN
3D	LRW	SSIM	0.96	LipGAN
3D Face Modelling	LRW	LMD	0.6	LipGAN
3D Face Modelling	LRW	SSIM	0.96	LipGAN
3D Face Reconstruction	LRW	LMD	0.6	LipGAN
3D Face Reconstruction	LRW	SSIM	0.96	LipGAN
Talking Face Generation	LRW	LMD	0.6	LipGAN
Talking Face Generation	LRW	SSIM	0.96	LipGAN
10-shot image generation	LRW	LMD	0.6	LipGAN
10-shot image generation	LRW	SSIM	0.96	LipGAN
1 Image, 2*2 Stitchi	LRW	LMD	0.6	LipGAN
1 Image, 2*2 Stitchi	LRW	SSIM	0.96	LipGAN

Towards Automatic Face-to-Face Translation

Abstract

Results

Related Papers

Towards Automatic Face-to-Face Translation

Abstract

Results

Related Papers