CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior

Jinbo Xing, Menghan Xia, Yuechen Zhang, Xiaodong Cun, Jue Wang, Tien-Tsin Wong

2023-01-06CVPR 2023 1regression 3D Face Animation

Abstract

Speech-driven 3D facial animation has been widely studied, yet there is still a gap to achieving realism and vividness due to the highly ill-posed nature and scarcity of audio-visual data. Existing works typically formulate the cross-modal mapping into a regression task, which suffers from the regression-to-mean problem leading to over-smoothed facial motions. In this paper, we propose to cast speech-driven facial animation as a code query task in a finite proxy space of the learned codebook, which effectively promotes the vividness of the generated motions by reducing the cross-modal mapping uncertainty. The codebook is learned by self-reconstruction over real facial motions and thus embedded with realistic facial motion priors. Over the discrete motion space, a temporal autoregressive model is employed to sequentially synthesize facial motions from the input speech signal, which guarantees lip-sync as well as plausible facial expressions. We demonstrate that our approach outperforms current state-of-the-art methods both qualitatively and quantitatively. Also, a user study further justifies our superiority in perceptual quality.

Results

Task	Dataset	Metric	Value	Model
3D Human Pose Estimation	Biwi 3D Audiovisual Corpus of Affective Communication - B3D(AC)^2	FDD	4.117	CodeTalker
3D Human Pose Estimation	Biwi 3D Audiovisual Corpus of Affective Communication - B3D(AC)^2	Lip Vertex Error	4.7914	CodeTalker
3D Human Pose Estimation	BEAT2	MSE	8.026	CodeTalker
Pose Estimation	Biwi 3D Audiovisual Corpus of Affective Communication - B3D(AC)^2	FDD	4.117	CodeTalker
Pose Estimation	Biwi 3D Audiovisual Corpus of Affective Communication - B3D(AC)^2	Lip Vertex Error	4.7914	CodeTalker
Pose Estimation	BEAT2	MSE	8.026	CodeTalker
3D	Biwi 3D Audiovisual Corpus of Affective Communication - B3D(AC)^2	FDD	4.117	CodeTalker
3D	Biwi 3D Audiovisual Corpus of Affective Communication - B3D(AC)^2	Lip Vertex Error	4.7914	CodeTalker
3D	BEAT2	MSE	8.026	CodeTalker
3D Face Animation	Biwi 3D Audiovisual Corpus of Affective Communication - B3D(AC)^2	FDD	4.117	CodeTalker
3D Face Animation	Biwi 3D Audiovisual Corpus of Affective Communication - B3D(AC)^2	Lip Vertex Error	4.7914	CodeTalker
3D Face Animation	BEAT2	MSE	8.026	CodeTalker
2D Human Pose Estimation	Biwi 3D Audiovisual Corpus of Affective Communication - B3D(AC)^2	FDD	4.117	CodeTalker
2D Human Pose Estimation	Biwi 3D Audiovisual Corpus of Affective Communication - B3D(AC)^2	Lip Vertex Error	4.7914	CodeTalker
2D Human Pose Estimation	BEAT2	MSE	8.026	CodeTalker
3D Absolute Human Pose Estimation	Biwi 3D Audiovisual Corpus of Affective Communication - B3D(AC)^2	FDD	4.117	CodeTalker
3D Absolute Human Pose Estimation	Biwi 3D Audiovisual Corpus of Affective Communication - B3D(AC)^2	Lip Vertex Error	4.7914	CodeTalker
3D Absolute Human Pose Estimation	BEAT2	MSE	8.026	CodeTalker
1 Image, 2*2 Stitchi	Biwi 3D Audiovisual Corpus of Affective Communication - B3D(AC)^2	FDD	4.117	CodeTalker
1 Image, 2*2 Stitchi	Biwi 3D Audiovisual Corpus of Affective Communication - B3D(AC)^2	Lip Vertex Error	4.7914	CodeTalker
1 Image, 2*2 Stitchi	BEAT2	MSE	8.026	CodeTalker

CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior

Abstract

Results

Related Papers

CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior

Abstract

Results

Related Papers