TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Towards Automatic Face-to-Face Translation

Towards Automatic Face-to-Face Translation

Prajwal K R, Rudrabha Mukhopadhyay, Jerin Philip, Abhishek Jha, Vinay Namboodiri, C. V. Jawahar

2020-03-01ACM Multimedia, 2019 2019 10Speech-to-Speech TranslationMachine TranslationTranslationUnconstrained Lip-synchronization
PaperPDFCode(official)

Abstract

In light of the recent breakthroughs in automatic machine translation systems, we propose a novel approach that we term as "Face-to-Face Translation". As today's digital communication becomes increasingly visual, we argue that there is a need for systems that can automatically translate a video of a person speaking in language A into a target language B with realistic lip synchronization. In this work, we create an automatic pipeline for this problem and demonstrate its impact on multiple real-world applications. First, we build a working speech-to-speech translation system by bringing together multiple existing modules from speech and language. We then move towards "Face-to-Face Translation" by incorporating a novel visual module, LipGAN for generating realistic talking faces from the translated audio. Quantitative evaluation of LipGAN on the standard LRW test set shows that it significantly outperforms existing approaches across all standard metrics. We also subject our Face-to-Face Translation pipeline, to multiple human evaluations and show that it can significantly improve the overall user experience for consuming and interacting with multimodal content across languages. Code, models and demo video are made publicly available. Demo video: https://www.youtube.com/watch?v=aHG6Oei8jF0 Code and models: https://github.com/Rudrabha/LipGAN

Results

TaskDatasetMetricValueModel
Facial Recognition and ModellingLRWLMD0.6LipGAN
Facial Recognition and ModellingLRWSSIM0.96LipGAN
Image GenerationLRWLMD0.6LipGAN
Image GenerationLRWSSIM0.96LipGAN
Face GenerationLRWLMD0.6LipGAN
Face GenerationLRWSSIM0.96LipGAN
Face ReconstructionLRWLMD0.6LipGAN
Face ReconstructionLRWSSIM0.96LipGAN
3DLRWLMD0.6LipGAN
3DLRWSSIM0.96LipGAN
3D Face ModellingLRWLMD0.6LipGAN
3D Face ModellingLRWSSIM0.96LipGAN
3D Face ReconstructionLRWLMD0.6LipGAN
3D Face ReconstructionLRWSSIM0.96LipGAN
Talking Face GenerationLRWLMD0.6LipGAN
Talking Face GenerationLRWSSIM0.96LipGAN
10-shot image generationLRWLMD0.6LipGAN
10-shot image generationLRWSSIM0.96LipGAN
1 Image, 2*2 StitchiLRWLMD0.6LipGAN
1 Image, 2*2 StitchiLRWSSIM0.96LipGAN

Related Papers

A Translation of Probabilistic Event Calculus into Markov Decision Processes2025-07-17Function-to-Style Guidance of LLMs for Code Translation2025-07-15Speak2Sign3D: A Multi-modal Pipeline for English Speech to American Sign Language Animation2025-07-09Pun Intended: Multi-Agent Translation of Wordplay with Contrastive Learning and Phonetic-Semantic Embeddings2025-07-09Unconditional Diffusion for Generative Sequential Recommendation2025-07-08GRAFT: A Graph-based Flow-aware Agentic Framework for Document-level Machine Translation2025-07-04TransLaw: Benchmarking Large Language Models in Multi-Agent Simulation of the Collaborative Translation2025-07-01CycleVAR: Repurposing Autoregressive Model for Unsupervised One-Step Image Translation2025-06-29