TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/SelfTalk: A Self-Supervised Commutative Training Diagram t...

SelfTalk: A Self-Supervised Commutative Training Diagram to Comprehend 3D Talking Faces

Ziqiao Peng, Yihao Luo, Yue Shi, Hao Xu, Xiangyu Zhu, Jun He, Hongyan Liu, Zhaoxin Fan

2023-06-193D Face AnimationLip Reading
PaperPDFCode(official)

Abstract

Speech-driven 3D face animation technique, extending its applications to various multimedia fields. Previous research has generated promising realistic lip movements and facial expressions from audio signals. However, traditional regression models solely driven by data face several essential problems, such as difficulties in accessing precise labels and domain gaps between different modalities, leading to unsatisfactory results lacking precision and coherence. To enhance the visual accuracy of generated lip movement while reducing the dependence on labeled data, we propose a novel framework SelfTalk, by involving self-supervision in a cross-modals network system to learn 3D talking faces. The framework constructs a network system consisting of three modules: facial animator, speech recognizer, and lip-reading interpreter. The core of SelfTalk is a commutative training diagram that facilitates compatible features exchange among audio, text, and lip shape, enabling our models to learn the intricate connection between these factors. The proposed framework leverages the knowledge learned from the lip-reading interpreter to generate more plausible lip shapes. Extensive experiments and user studies demonstrate that our proposed approach achieves state-of-the-art performance both qualitatively and quantitatively. We recommend watching the supplementary video.

Results

TaskDatasetMetricValueModel
3D Human Pose EstimationBiwi 3D Audiovisual Corpus of Affective Communication - B3D(AC)^2FDD3.5761SelfTalk
3D Human Pose EstimationBiwi 3D Audiovisual Corpus of Affective Communication - B3D(AC)^2Lip Vertex Error4.2485SelfTalk
Pose EstimationBiwi 3D Audiovisual Corpus of Affective Communication - B3D(AC)^2FDD3.5761SelfTalk
Pose EstimationBiwi 3D Audiovisual Corpus of Affective Communication - B3D(AC)^2Lip Vertex Error4.2485SelfTalk
3DBiwi 3D Audiovisual Corpus of Affective Communication - B3D(AC)^2FDD3.5761SelfTalk
3DBiwi 3D Audiovisual Corpus of Affective Communication - B3D(AC)^2Lip Vertex Error4.2485SelfTalk
3D Face AnimationBiwi 3D Audiovisual Corpus of Affective Communication - B3D(AC)^2FDD3.5761SelfTalk
3D Face AnimationBiwi 3D Audiovisual Corpus of Affective Communication - B3D(AC)^2Lip Vertex Error4.2485SelfTalk
2D Human Pose EstimationBiwi 3D Audiovisual Corpus of Affective Communication - B3D(AC)^2FDD3.5761SelfTalk
2D Human Pose EstimationBiwi 3D Audiovisual Corpus of Affective Communication - B3D(AC)^2Lip Vertex Error4.2485SelfTalk
3D Absolute Human Pose EstimationBiwi 3D Audiovisual Corpus of Affective Communication - B3D(AC)^2FDD3.5761SelfTalk
3D Absolute Human Pose EstimationBiwi 3D Audiovisual Corpus of Affective Communication - B3D(AC)^2Lip Vertex Error4.2485SelfTalk
1 Image, 2*2 StitchiBiwi 3D Audiovisual Corpus of Affective Communication - B3D(AC)^2FDD3.5761SelfTalk
1 Image, 2*2 StitchiBiwi 3D Audiovisual Corpus of Affective Communication - B3D(AC)^2Lip Vertex Error4.2485SelfTalk

Related Papers

VisualSpeaker: Visually-Guided 3D Avatar Lip Synthesis2025-07-08SwinLip: An Efficient Visual Speech Encoder for Lip Reading Using Swin Transformer2025-05-07Transforming faces into video stories -- VideoFace2.02025-05-04Development and evaluation of a deep learning algorithm for German word recognition from lip movements2025-04-22Chinese-LiPS: A Chinese audio-visual speech recognition dataset with Lip-reading and Presentation Slides2025-04-21VALLR: Visual ASR Language Model for Lip Reading2025-03-27DiffusionTalker: Efficient and Compact Speech-Driven 3D Talking Head via Personalizer-Guided Distillation2025-03-23Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning2025-03-18