Learning Temporal Coherence via Self-Supervision for GAN-based Video Generation

Mengyu Chu, You Xie, Jonas Mayer, Laura Leal-Taixé, Nils Thuerey

2018-11-23Super-Resolution Motion Compensation Video Super-Resolution Image Super-Resolution Translation Video Generation

Paper PDF Code Code Code Code Code Code Code Code(official)Code Code Code Code Code

Abstract

Our work explores temporal self-supervision for GAN-based video generation tasks. While adversarial training successfully yields generative models for a variety of areas, temporal relationships in the generated data are much less explored. Natural temporal changes are crucial for sequential generation tasks, e.g. video super-resolution and unpaired video translation. For the former, state-of-the-art methods often favor simpler norm losses such as $L^2$ over adversarial training. However, their averaging nature easily leads to temporally smooth results with an undesirable lack of spatial detail. For unpaired video translation, existing approaches modify the generator networks to form spatio-temporal cycle consistencies. In contrast, we focus on improving learning objectives and propose a temporally self-supervised algorithm. For both tasks, we show that temporal adversarial learning is key to achieving temporally coherent solutions without sacrificing spatial detail. We also propose a novel Ping-Pong loss to improve the long-term temporal consistency. It effectively prevents recurrent networks from accumulating artifacts temporally without depressing detailed features. Additionally, we propose a first set of metrics to quantitatively evaluate the accuracy as well as the perceptual quality of the temporal evolution. A series of user studies confirm the rankings computed with these metrics. Code, data, models, and results are provided at https://github.com/thunil/TecoGAN. The project page https://ge.in.tum.de/publications/2019-tecogan-chu/ contains supplemental materials.

Results

Task	Dataset	Metric	Value	Model
Super-Resolution	MSU Video Upscalers: Quality Enhancement	PSNR	26.6	TecoGAN
Super-Resolution	MSU Video Upscalers: Quality Enhancement	SSIM	0.933	TecoGAN
Super-Resolution	MSU Video Upscalers: Quality Enhancement	VMAF	61.2	TecoGAN
Super-Resolution	Vid4 - 4x upscaling	PSNR	25.89	TecoGAN⊖
Super-Resolution	Vid4 - 4x upscaling	PSNR	25.57	TecoGAN
3D Human Pose Estimation	MSU Video Upscalers: Quality Enhancement	PSNR	26.6	TecoGAN
3D Human Pose Estimation	MSU Video Upscalers: Quality Enhancement	SSIM	0.933	TecoGAN
3D Human Pose Estimation	MSU Video Upscalers: Quality Enhancement	VMAF	61.2	TecoGAN
3D Human Pose Estimation	Vid4 - 4x upscaling	PSNR	25.89	TecoGAN⊖
3D Human Pose Estimation	Vid4 - 4x upscaling	PSNR	25.57	TecoGAN
Video	MSU Video Upscalers: Quality Enhancement	PSNR	26.6	TecoGAN
Video	MSU Video Upscalers: Quality Enhancement	SSIM	0.933	TecoGAN
Video	MSU Video Upscalers: Quality Enhancement	VMAF	61.2	TecoGAN
Video	Vid4 - 4x upscaling	PSNR	25.89	TecoGAN⊖
Video	Vid4 - 4x upscaling	PSNR	25.57	TecoGAN
Pose Estimation	MSU Video Upscalers: Quality Enhancement	PSNR	26.6	TecoGAN
Pose Estimation	MSU Video Upscalers: Quality Enhancement	SSIM	0.933	TecoGAN
Pose Estimation	MSU Video Upscalers: Quality Enhancement	VMAF	61.2	TecoGAN
Pose Estimation	Vid4 - 4x upscaling	PSNR	25.89	TecoGAN⊖
Pose Estimation	Vid4 - 4x upscaling	PSNR	25.57	TecoGAN
3D	MSU Video Upscalers: Quality Enhancement	PSNR	26.6	TecoGAN
3D	MSU Video Upscalers: Quality Enhancement	SSIM	0.933	TecoGAN
3D	MSU Video Upscalers: Quality Enhancement	VMAF	61.2	TecoGAN
3D	Vid4 - 4x upscaling	PSNR	25.89	TecoGAN⊖
3D	Vid4 - 4x upscaling	PSNR	25.57	TecoGAN
3D Face Animation	MSU Video Upscalers: Quality Enhancement	PSNR	26.6	TecoGAN
3D Face Animation	MSU Video Upscalers: Quality Enhancement	SSIM	0.933	TecoGAN
3D Face Animation	MSU Video Upscalers: Quality Enhancement	VMAF	61.2	TecoGAN
3D Face Animation	Vid4 - 4x upscaling	PSNR	25.89	TecoGAN⊖
3D Face Animation	Vid4 - 4x upscaling	PSNR	25.57	TecoGAN
2D Human Pose Estimation	MSU Video Upscalers: Quality Enhancement	PSNR	26.6	TecoGAN
2D Human Pose Estimation	MSU Video Upscalers: Quality Enhancement	SSIM	0.933	TecoGAN
2D Human Pose Estimation	MSU Video Upscalers: Quality Enhancement	VMAF	61.2	TecoGAN
2D Human Pose Estimation	Vid4 - 4x upscaling	PSNR	25.89	TecoGAN⊖
2D Human Pose Estimation	Vid4 - 4x upscaling	PSNR	25.57	TecoGAN
3D Absolute Human Pose Estimation	MSU Video Upscalers: Quality Enhancement	PSNR	26.6	TecoGAN
3D Absolute Human Pose Estimation	MSU Video Upscalers: Quality Enhancement	SSIM	0.933	TecoGAN
3D Absolute Human Pose Estimation	MSU Video Upscalers: Quality Enhancement	VMAF	61.2	TecoGAN
3D Absolute Human Pose Estimation	Vid4 - 4x upscaling	PSNR	25.89	TecoGAN⊖
3D Absolute Human Pose Estimation	Vid4 - 4x upscaling	PSNR	25.57	TecoGAN
Video Super-Resolution	MSU Video Upscalers: Quality Enhancement	PSNR	26.6	TecoGAN
Video Super-Resolution	MSU Video Upscalers: Quality Enhancement	SSIM	0.933	TecoGAN
Video Super-Resolution	MSU Video Upscalers: Quality Enhancement	VMAF	61.2	TecoGAN
Video Super-Resolution	Vid4 - 4x upscaling	PSNR	25.89	TecoGAN⊖
Video Super-Resolution	Vid4 - 4x upscaling	PSNR	25.57	TecoGAN
3D Object Super-Resolution	MSU Video Upscalers: Quality Enhancement	PSNR	26.6	TecoGAN
3D Object Super-Resolution	MSU Video Upscalers: Quality Enhancement	SSIM	0.933	TecoGAN
3D Object Super-Resolution	MSU Video Upscalers: Quality Enhancement	VMAF	61.2	TecoGAN
3D Object Super-Resolution	Vid4 - 4x upscaling	PSNR	25.89	TecoGAN⊖
3D Object Super-Resolution	Vid4 - 4x upscaling	PSNR	25.57	TecoGAN
1 Image, 2*2 Stitchi	MSU Video Upscalers: Quality Enhancement	PSNR	26.6	TecoGAN
1 Image, 2*2 Stitchi	MSU Video Upscalers: Quality Enhancement	SSIM	0.933	TecoGAN
1 Image, 2*2 Stitchi	MSU Video Upscalers: Quality Enhancement	VMAF	61.2	TecoGAN
1 Image, 2*2 Stitchi	Vid4 - 4x upscaling	PSNR	25.89	TecoGAN⊖
1 Image, 2*2 Stitchi	Vid4 - 4x upscaling	PSNR	25.57	TecoGAN

Learning Temporal Coherence via Self-Supervision for GAN-based Video Generation

Abstract

Results

Related Papers

Learning Temporal Coherence via Self-Supervision for GAN-based Video Generation

Abstract

Results

Related Papers