TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/FaceFormer: Speech-Driven 3D Facial Animation with Transfo...

FaceFormer: Speech-Driven 3D Facial Animation with Transformers

Yingruo Fan, Zhaojiang Lin, Jun Saito, Wenping Wang, Taku Komura

2021-12-10CVPR 2022 13D Face Animation
PaperPDFCode(official)

Abstract

Speech-driven 3D facial animation is challenging due to the complex geometry of human faces and the limited availability of 3D audio-visual data. Prior works typically focus on learning phoneme-level features of short audio windows with limited context, occasionally resulting in inaccurate lip movements. To tackle this limitation, we propose a Transformer-based autoregressive model, FaceFormer, which encodes the long-term audio context and autoregressively predicts a sequence of animated 3D face meshes. To cope with the data scarcity issue, we integrate the self-supervised pre-trained speech representations. Also, we devise two biased attention mechanisms well suited to this specific task, including the biased cross-modal multi-head (MH) attention and the biased causal MH self-attention with a periodic positional encoding strategy. The former effectively aligns the audio-motion modalities, whereas the latter offers abilities to generalize to longer audio sequences. Extensive experiments and a perceptual user study show that our approach outperforms the existing state-of-the-arts. The code will be made available.

Results

TaskDatasetMetricValueModel
3D Human Pose EstimationBiwi 3D Audiovisual Corpus of Affective Communication - B3D(AC)^2FDD4.6408FaceFormer
3D Human Pose EstimationBiwi 3D Audiovisual Corpus of Affective Communication - B3D(AC)^2Lip Vertex Error5.3077FaceFormer
3D Human Pose EstimationVOCASETLip Vertex Error5.3742FaceFormer
3D Human Pose EstimationBEAT2MSE7.787FaceFormer
Pose EstimationBiwi 3D Audiovisual Corpus of Affective Communication - B3D(AC)^2FDD4.6408FaceFormer
Pose EstimationBiwi 3D Audiovisual Corpus of Affective Communication - B3D(AC)^2Lip Vertex Error5.3077FaceFormer
Pose EstimationVOCASETLip Vertex Error5.3742FaceFormer
Pose EstimationBEAT2MSE7.787FaceFormer
3DBiwi 3D Audiovisual Corpus of Affective Communication - B3D(AC)^2FDD4.6408FaceFormer
3DBiwi 3D Audiovisual Corpus of Affective Communication - B3D(AC)^2Lip Vertex Error5.3077FaceFormer
3DVOCASETLip Vertex Error5.3742FaceFormer
3DBEAT2MSE7.787FaceFormer
3D Face AnimationBiwi 3D Audiovisual Corpus of Affective Communication - B3D(AC)^2FDD4.6408FaceFormer
3D Face AnimationBiwi 3D Audiovisual Corpus of Affective Communication - B3D(AC)^2Lip Vertex Error5.3077FaceFormer
3D Face AnimationVOCASETLip Vertex Error5.3742FaceFormer
3D Face AnimationBEAT2MSE7.787FaceFormer
2D Human Pose EstimationBiwi 3D Audiovisual Corpus of Affective Communication - B3D(AC)^2FDD4.6408FaceFormer
2D Human Pose EstimationBiwi 3D Audiovisual Corpus of Affective Communication - B3D(AC)^2Lip Vertex Error5.3077FaceFormer
2D Human Pose EstimationVOCASETLip Vertex Error5.3742FaceFormer
2D Human Pose EstimationBEAT2MSE7.787FaceFormer
3D Absolute Human Pose EstimationBiwi 3D Audiovisual Corpus of Affective Communication - B3D(AC)^2FDD4.6408FaceFormer
3D Absolute Human Pose EstimationBiwi 3D Audiovisual Corpus of Affective Communication - B3D(AC)^2Lip Vertex Error5.3077FaceFormer
3D Absolute Human Pose EstimationVOCASETLip Vertex Error5.3742FaceFormer
3D Absolute Human Pose EstimationBEAT2MSE7.787FaceFormer
1 Image, 2*2 StitchiBiwi 3D Audiovisual Corpus of Affective Communication - B3D(AC)^2FDD4.6408FaceFormer
1 Image, 2*2 StitchiBiwi 3D Audiovisual Corpus of Affective Communication - B3D(AC)^2Lip Vertex Error5.3077FaceFormer
1 Image, 2*2 StitchiVOCASETLip Vertex Error5.3742FaceFormer
1 Image, 2*2 StitchiBEAT2MSE7.787FaceFormer

Related Papers

DiffusionTalker: Efficient and Compact Speech-Driven 3D Talking Head via Personalizer-Guided Distillation2025-03-23Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning2025-03-18Towards High-fidelity 3D Talking Avatar with Personalized Dynamic Texture2025-01-01EmoFace: Emotion-Content Disentangled Speech-Driven 3D Talking Face Animation2024-08-21Meta-Learning Empowered Meta-Face: Personalized Speaking Style Adaptation for Audio-Driven 3D Talking Face Animation2024-08-18DEEPTalk: Dynamic Emotion Embedding for Probabilistic Speech-Driven 3D Face Animation2024-08-12EmoFace: Audio-driven Emotional 3D Face Animation2024-07-17LeGO: Leveraging a Surface Deformation Network for Animatable Stylized Face Generation with One Example2024-03-22