Fast Bi-layer Neural Synthesis of One-Shot Realistic Head Avatars

Egor Zakharov, Aleksei Ivakhnenko, Aliaksandra Shysheya, Victor Lempitsky

2020-08-24ECCV 2020 8Neural Rendering Talking Head Generation

Abstract

We propose a neural rendering-based system that creates head avatars from a single photograph. Our approach models a person's appearance by decomposing it into two layers. The first layer is a pose-dependent coarse image that is synthesized by a small neural network. The second layer is defined by a pose-independent texture image that contains high-frequency details. The texture image is generated offline, warped and added to the coarse image to ensure a high effective resolution of synthesized head views. We compare our system to analogous state-of-the-art systems in terms of visual quality and speed. The experiments show significant inference speedup over previous neural head avatar models for a given visual quality. We also report on a real-time smartphone-based implementation of our system.

Results

Task	Dataset	Metric	Value	Model
Facial Recognition and Modelling	VoxCeleb2 - 1-shot learning	CSIM	0.653	Fast Bi-layer Avatars (medium size)
Facial Recognition and Modelling	VoxCeleb2 - 1-shot learning	LPIPS	0.358	Fast Bi-layer Avatars (medium size)
Facial Recognition and Modelling	VoxCeleb2 - 1-shot learning	Normalized Pose Error	43.3	Fast Bi-layer Avatars (medium size)
Facial Recognition and Modelling	VoxCeleb2 - 1-shot learning	SSIM	0.508	Fast Bi-layer Avatars (medium size)
Facial Recognition and Modelling	VoxCeleb2 - 1-shot learning	inference time (ms)	4	Fast Bi-layer Avatars (medium size)
Facial Recognition and Modelling	VoxCeleb2 - 1-shot learning	CSIM	0.638	First Order Motion Model (medium size)
Facial Recognition and Modelling	VoxCeleb2 - 1-shot learning	LPIPS	0.311	First Order Motion Model (medium size)
Facial Recognition and Modelling	VoxCeleb2 - 1-shot learning	Normalized Pose Error	47.8	First Order Motion Model (medium size)
Facial Recognition and Modelling	VoxCeleb2 - 1-shot learning	SSIM	0.553	First Order Motion Model (medium size)
Facial Recognition and Modelling	VoxCeleb2 - 1-shot learning	inference time (ms)	13	First Order Motion Model (medium size)
Facial Recognition and Modelling	VoxCeleb2 - 1-shot learning	CSIM	0.604	Few-shot Vid-to-vid (medium size)
Facial Recognition and Modelling	VoxCeleb2 - 1-shot learning	LPIPS	0.368	Few-shot Vid-to-vid (medium size)
Facial Recognition and Modelling	VoxCeleb2 - 1-shot learning	Normalized Pose Error	46.1	Few-shot Vid-to-vid (medium size)
Facial Recognition and Modelling	VoxCeleb2 - 1-shot learning	SSIM	0.419	Few-shot Vid-to-vid (medium size)
Facial Recognition and Modelling	VoxCeleb2 - 1-shot learning	inference time (ms)	22	Few-shot Vid-to-vid (medium size)
Image Generation	VoxCeleb2 - 1-shot learning	CSIM	0.653	Fast Bi-layer Avatars (medium size)
Image Generation	VoxCeleb2 - 1-shot learning	LPIPS	0.358	Fast Bi-layer Avatars (medium size)
Image Generation	VoxCeleb2 - 1-shot learning	Normalized Pose Error	43.3	Fast Bi-layer Avatars (medium size)
Image Generation	VoxCeleb2 - 1-shot learning	SSIM	0.508	Fast Bi-layer Avatars (medium size)
Image Generation	VoxCeleb2 - 1-shot learning	inference time (ms)	4	Fast Bi-layer Avatars (medium size)
Image Generation	VoxCeleb2 - 1-shot learning	CSIM	0.638	First Order Motion Model (medium size)
Image Generation	VoxCeleb2 - 1-shot learning	LPIPS	0.311	First Order Motion Model (medium size)
Image Generation	VoxCeleb2 - 1-shot learning	Normalized Pose Error	47.8	First Order Motion Model (medium size)
Image Generation	VoxCeleb2 - 1-shot learning	SSIM	0.553	First Order Motion Model (medium size)
Image Generation	VoxCeleb2 - 1-shot learning	inference time (ms)	13	First Order Motion Model (medium size)
Image Generation	VoxCeleb2 - 1-shot learning	CSIM	0.604	Few-shot Vid-to-vid (medium size)
Image Generation	VoxCeleb2 - 1-shot learning	LPIPS	0.368	Few-shot Vid-to-vid (medium size)
Image Generation	VoxCeleb2 - 1-shot learning	Normalized Pose Error	46.1	Few-shot Vid-to-vid (medium size)
Image Generation	VoxCeleb2 - 1-shot learning	SSIM	0.419	Few-shot Vid-to-vid (medium size)
Image Generation	VoxCeleb2 - 1-shot learning	inference time (ms)	22	Few-shot Vid-to-vid (medium size)
Talking Head Generation	VoxCeleb2 - 1-shot learning	CSIM	0.653	Fast Bi-layer Avatars (medium size)
Talking Head Generation	VoxCeleb2 - 1-shot learning	LPIPS	0.358	Fast Bi-layer Avatars (medium size)
Talking Head Generation	VoxCeleb2 - 1-shot learning	Normalized Pose Error	43.3	Fast Bi-layer Avatars (medium size)
Talking Head Generation	VoxCeleb2 - 1-shot learning	SSIM	0.508	Fast Bi-layer Avatars (medium size)
Talking Head Generation	VoxCeleb2 - 1-shot learning	inference time (ms)	4	Fast Bi-layer Avatars (medium size)
Talking Head Generation	VoxCeleb2 - 1-shot learning	CSIM	0.638	First Order Motion Model (medium size)
Talking Head Generation	VoxCeleb2 - 1-shot learning	LPIPS	0.311	First Order Motion Model (medium size)
Talking Head Generation	VoxCeleb2 - 1-shot learning	Normalized Pose Error	47.8	First Order Motion Model (medium size)
Talking Head Generation	VoxCeleb2 - 1-shot learning	SSIM	0.553	First Order Motion Model (medium size)
Talking Head Generation	VoxCeleb2 - 1-shot learning	inference time (ms)	13	First Order Motion Model (medium size)
Talking Head Generation	VoxCeleb2 - 1-shot learning	CSIM	0.604	Few-shot Vid-to-vid (medium size)
Talking Head Generation	VoxCeleb2 - 1-shot learning	LPIPS	0.368	Few-shot Vid-to-vid (medium size)
Talking Head Generation	VoxCeleb2 - 1-shot learning	Normalized Pose Error	46.1	Few-shot Vid-to-vid (medium size)
Talking Head Generation	VoxCeleb2 - 1-shot learning	SSIM	0.419	Few-shot Vid-to-vid (medium size)
Talking Head Generation	VoxCeleb2 - 1-shot learning	inference time (ms)	22	Few-shot Vid-to-vid (medium size)
Face Generation	VoxCeleb2 - 1-shot learning	CSIM	0.653	Fast Bi-layer Avatars (medium size)
Face Generation	VoxCeleb2 - 1-shot learning	LPIPS	0.358	Fast Bi-layer Avatars (medium size)
Face Generation	VoxCeleb2 - 1-shot learning	Normalized Pose Error	43.3	Fast Bi-layer Avatars (medium size)
Face Generation	VoxCeleb2 - 1-shot learning	SSIM	0.508	Fast Bi-layer Avatars (medium size)
Face Generation	VoxCeleb2 - 1-shot learning	inference time (ms)	4	Fast Bi-layer Avatars (medium size)
Face Generation	VoxCeleb2 - 1-shot learning	CSIM	0.638	First Order Motion Model (medium size)
Face Generation	VoxCeleb2 - 1-shot learning	LPIPS	0.311	First Order Motion Model (medium size)
Face Generation	VoxCeleb2 - 1-shot learning	Normalized Pose Error	47.8	First Order Motion Model (medium size)
Face Generation	VoxCeleb2 - 1-shot learning	SSIM	0.553	First Order Motion Model (medium size)
Face Generation	VoxCeleb2 - 1-shot learning	inference time (ms)	13	First Order Motion Model (medium size)
Face Generation	VoxCeleb2 - 1-shot learning	CSIM	0.604	Few-shot Vid-to-vid (medium size)
Face Generation	VoxCeleb2 - 1-shot learning	LPIPS	0.368	Few-shot Vid-to-vid (medium size)
Face Generation	VoxCeleb2 - 1-shot learning	Normalized Pose Error	46.1	Few-shot Vid-to-vid (medium size)
Face Generation	VoxCeleb2 - 1-shot learning	SSIM	0.419	Few-shot Vid-to-vid (medium size)
Face Generation	VoxCeleb2 - 1-shot learning	inference time (ms)	22	Few-shot Vid-to-vid (medium size)
Face Reconstruction	VoxCeleb2 - 1-shot learning	CSIM	0.653	Fast Bi-layer Avatars (medium size)
Face Reconstruction	VoxCeleb2 - 1-shot learning	LPIPS	0.358	Fast Bi-layer Avatars (medium size)
Face Reconstruction	VoxCeleb2 - 1-shot learning	Normalized Pose Error	43.3	Fast Bi-layer Avatars (medium size)
Face Reconstruction	VoxCeleb2 - 1-shot learning	SSIM	0.508	Fast Bi-layer Avatars (medium size)
Face Reconstruction	VoxCeleb2 - 1-shot learning	inference time (ms)	4	Fast Bi-layer Avatars (medium size)
Face Reconstruction	VoxCeleb2 - 1-shot learning	CSIM	0.638	First Order Motion Model (medium size)
Face Reconstruction	VoxCeleb2 - 1-shot learning	LPIPS	0.311	First Order Motion Model (medium size)
Face Reconstruction	VoxCeleb2 - 1-shot learning	Normalized Pose Error	47.8	First Order Motion Model (medium size)
Face Reconstruction	VoxCeleb2 - 1-shot learning	SSIM	0.553	First Order Motion Model (medium size)
Face Reconstruction	VoxCeleb2 - 1-shot learning	inference time (ms)	13	First Order Motion Model (medium size)
Face Reconstruction	VoxCeleb2 - 1-shot learning	CSIM	0.604	Few-shot Vid-to-vid (medium size)
Face Reconstruction	VoxCeleb2 - 1-shot learning	LPIPS	0.368	Few-shot Vid-to-vid (medium size)
Face Reconstruction	VoxCeleb2 - 1-shot learning	Normalized Pose Error	46.1	Few-shot Vid-to-vid (medium size)
Face Reconstruction	VoxCeleb2 - 1-shot learning	SSIM	0.419	Few-shot Vid-to-vid (medium size)
Face Reconstruction	VoxCeleb2 - 1-shot learning	inference time (ms)	22	Few-shot Vid-to-vid (medium size)
3D	VoxCeleb2 - 1-shot learning	CSIM	0.653	Fast Bi-layer Avatars (medium size)
3D	VoxCeleb2 - 1-shot learning	LPIPS	0.358	Fast Bi-layer Avatars (medium size)
3D	VoxCeleb2 - 1-shot learning	Normalized Pose Error	43.3	Fast Bi-layer Avatars (medium size)
3D	VoxCeleb2 - 1-shot learning	SSIM	0.508	Fast Bi-layer Avatars (medium size)
3D	VoxCeleb2 - 1-shot learning	inference time (ms)	4	Fast Bi-layer Avatars (medium size)
3D	VoxCeleb2 - 1-shot learning	CSIM	0.638	First Order Motion Model (medium size)
3D	VoxCeleb2 - 1-shot learning	LPIPS	0.311	First Order Motion Model (medium size)
3D	VoxCeleb2 - 1-shot learning	Normalized Pose Error	47.8	First Order Motion Model (medium size)
3D	VoxCeleb2 - 1-shot learning	SSIM	0.553	First Order Motion Model (medium size)
3D	VoxCeleb2 - 1-shot learning	inference time (ms)	13	First Order Motion Model (medium size)
3D	VoxCeleb2 - 1-shot learning	CSIM	0.604	Few-shot Vid-to-vid (medium size)
3D	VoxCeleb2 - 1-shot learning	LPIPS	0.368	Few-shot Vid-to-vid (medium size)
3D	VoxCeleb2 - 1-shot learning	Normalized Pose Error	46.1	Few-shot Vid-to-vid (medium size)
3D	VoxCeleb2 - 1-shot learning	SSIM	0.419	Few-shot Vid-to-vid (medium size)
3D	VoxCeleb2 - 1-shot learning	inference time (ms)	22	Few-shot Vid-to-vid (medium size)
3D Face Modelling	VoxCeleb2 - 1-shot learning	CSIM	0.653	Fast Bi-layer Avatars (medium size)
3D Face Modelling	VoxCeleb2 - 1-shot learning	LPIPS	0.358	Fast Bi-layer Avatars (medium size)
3D Face Modelling	VoxCeleb2 - 1-shot learning	Normalized Pose Error	43.3	Fast Bi-layer Avatars (medium size)
3D Face Modelling	VoxCeleb2 - 1-shot learning	SSIM	0.508	Fast Bi-layer Avatars (medium size)
3D Face Modelling	VoxCeleb2 - 1-shot learning	inference time (ms)	4	Fast Bi-layer Avatars (medium size)
3D Face Modelling	VoxCeleb2 - 1-shot learning	CSIM	0.638	First Order Motion Model (medium size)
3D Face Modelling	VoxCeleb2 - 1-shot learning	LPIPS	0.311	First Order Motion Model (medium size)
3D Face Modelling	VoxCeleb2 - 1-shot learning	Normalized Pose Error	47.8	First Order Motion Model (medium size)
3D Face Modelling	VoxCeleb2 - 1-shot learning	SSIM	0.553	First Order Motion Model (medium size)
3D Face Modelling	VoxCeleb2 - 1-shot learning	inference time (ms)	13	First Order Motion Model (medium size)
3D Face Modelling	VoxCeleb2 - 1-shot learning	CSIM	0.604	Few-shot Vid-to-vid (medium size)
3D Face Modelling	VoxCeleb2 - 1-shot learning	LPIPS	0.368	Few-shot Vid-to-vid (medium size)
3D Face Modelling	VoxCeleb2 - 1-shot learning	Normalized Pose Error	46.1	Few-shot Vid-to-vid (medium size)
3D Face Modelling	VoxCeleb2 - 1-shot learning	SSIM	0.419	Few-shot Vid-to-vid (medium size)
3D Face Modelling	VoxCeleb2 - 1-shot learning	inference time (ms)	22	Few-shot Vid-to-vid (medium size)
3D Face Reconstruction	VoxCeleb2 - 1-shot learning	CSIM	0.653	Fast Bi-layer Avatars (medium size)
3D Face Reconstruction	VoxCeleb2 - 1-shot learning	LPIPS	0.358	Fast Bi-layer Avatars (medium size)
3D Face Reconstruction	VoxCeleb2 - 1-shot learning	Normalized Pose Error	43.3	Fast Bi-layer Avatars (medium size)
3D Face Reconstruction	VoxCeleb2 - 1-shot learning	SSIM	0.508	Fast Bi-layer Avatars (medium size)
3D Face Reconstruction	VoxCeleb2 - 1-shot learning	inference time (ms)	4	Fast Bi-layer Avatars (medium size)
3D Face Reconstruction	VoxCeleb2 - 1-shot learning	CSIM	0.638	First Order Motion Model (medium size)
3D Face Reconstruction	VoxCeleb2 - 1-shot learning	LPIPS	0.311	First Order Motion Model (medium size)
3D Face Reconstruction	VoxCeleb2 - 1-shot learning	Normalized Pose Error	47.8	First Order Motion Model (medium size)
3D Face Reconstruction	VoxCeleb2 - 1-shot learning	SSIM	0.553	First Order Motion Model (medium size)
3D Face Reconstruction	VoxCeleb2 - 1-shot learning	inference time (ms)	13	First Order Motion Model (medium size)
3D Face Reconstruction	VoxCeleb2 - 1-shot learning	CSIM	0.604	Few-shot Vid-to-vid (medium size)
3D Face Reconstruction	VoxCeleb2 - 1-shot learning	LPIPS	0.368	Few-shot Vid-to-vid (medium size)
3D Face Reconstruction	VoxCeleb2 - 1-shot learning	Normalized Pose Error	46.1	Few-shot Vid-to-vid (medium size)
3D Face Reconstruction	VoxCeleb2 - 1-shot learning	SSIM	0.419	Few-shot Vid-to-vid (medium size)
3D Face Reconstruction	VoxCeleb2 - 1-shot learning	inference time (ms)	22	Few-shot Vid-to-vid (medium size)
10-shot image generation	VoxCeleb2 - 1-shot learning	CSIM	0.653	Fast Bi-layer Avatars (medium size)
10-shot image generation	VoxCeleb2 - 1-shot learning	LPIPS	0.358	Fast Bi-layer Avatars (medium size)
10-shot image generation	VoxCeleb2 - 1-shot learning	Normalized Pose Error	43.3	Fast Bi-layer Avatars (medium size)
10-shot image generation	VoxCeleb2 - 1-shot learning	SSIM	0.508	Fast Bi-layer Avatars (medium size)
10-shot image generation	VoxCeleb2 - 1-shot learning	inference time (ms)	4	Fast Bi-layer Avatars (medium size)
10-shot image generation	VoxCeleb2 - 1-shot learning	CSIM	0.638	First Order Motion Model (medium size)
10-shot image generation	VoxCeleb2 - 1-shot learning	LPIPS	0.311	First Order Motion Model (medium size)
10-shot image generation	VoxCeleb2 - 1-shot learning	Normalized Pose Error	47.8	First Order Motion Model (medium size)
10-shot image generation	VoxCeleb2 - 1-shot learning	SSIM	0.553	First Order Motion Model (medium size)
10-shot image generation	VoxCeleb2 - 1-shot learning	inference time (ms)	13	First Order Motion Model (medium size)
10-shot image generation	VoxCeleb2 - 1-shot learning	CSIM	0.604	Few-shot Vid-to-vid (medium size)
10-shot image generation	VoxCeleb2 - 1-shot learning	LPIPS	0.368	Few-shot Vid-to-vid (medium size)
10-shot image generation	VoxCeleb2 - 1-shot learning	Normalized Pose Error	46.1	Few-shot Vid-to-vid (medium size)
10-shot image generation	VoxCeleb2 - 1-shot learning	SSIM	0.419	Few-shot Vid-to-vid (medium size)
10-shot image generation	VoxCeleb2 - 1-shot learning	inference time (ms)	22	Few-shot Vid-to-vid (medium size)

Fast Bi-layer Neural Synthesis of One-Shot Realistic Head Avatars

Abstract

Results

Related Papers

Fast Bi-layer Neural Synthesis of One-Shot Realistic Head Avatars

Abstract

Results

Related Papers