Exploring Spatial-Temporal Multi-Frequency Analysis for High-Fidelity and Temporal-Consistency Video Prediction

Beibei Jin, Yu Hu, Qiankun Tang, Jingyu Niu, Zhiping Shi, Yinhe Han, Xiaowei Li

2020-02-23CVPR 2020 6Video Prediction Prediction Video Generation

Abstract

Video prediction is a pixel-wise dense prediction task to infer future frames based on past frames. Missing appearance details and motion blur are still two major problems for current predictive models, which lead to image distortion and temporal inconsistency. In this paper, we point out the necessity of exploring multi-frequency analysis to deal with the two problems. Inspired by the frequency band decomposition characteristic of Human Vision System (HVS), we propose a video prediction network based on multi-level wavelet analysis to deal with spatial and temporal information in a unified manner. Specifically, the multi-level spatial discrete wavelet transform decomposes each video frame into anisotropic sub-bands with multiple frequencies, helping to enrich structural information and reserve fine details. On the other hand, multi-level temporal discrete wavelet transform which operates on time axis decomposes the frame sequence into sub-band groups of different frequencies to accurately capture multi-frequency motions under a fixed frame rate. Extensive experiments on diverse datasets demonstrate that our model shows significant improvements on fidelity and temporal consistency over state-of-the-art works.

Results

Task	Dataset	Metric	Value	Model
Video	BAIR Robot Pushing	Cond	2	WAM
Video	BAIR Robot Pushing	FVD score	159.6	WAM
Video	BAIR Robot Pushing	LPIPS	0.0936	WAM
Video	BAIR Robot Pushing	PSNR	21.02	WAM
Video	BAIR Robot Pushing	Pred	28	WAM
Video	BAIR Robot Pushing	SSIM	0.844	WAM
Video	BAIR Robot Pushing	Train	14	WAM
Video	KTH	Cond	10	WAM
Video	KTH	PSNR	29.85	WAM
Video	KTH	Pred	20	WAM
Video	KTH	SSIM	0.893	WAM
Video Prediction	KTH	Cond	10	WAM
Video Prediction	KTH	PSNR	29.85	WAM
Video Prediction	KTH	Pred	20	WAM
Video Prediction	KTH	SSIM	0.893	WAM
Video Generation	BAIR Robot Pushing	Cond	2	WAM
Video Generation	BAIR Robot Pushing	FVD score	159.6	WAM
Video Generation	BAIR Robot Pushing	LPIPS	0.0936	WAM
Video Generation	BAIR Robot Pushing	PSNR	21.02	WAM
Video Generation	BAIR Robot Pushing	Pred	28	WAM
Video Generation	BAIR Robot Pushing	SSIM	0.844	WAM
Video Generation	BAIR Robot Pushing	Train	14	WAM

Exploring Spatial-Temporal Multi-Frequency Analysis for High-Fidelity and Temporal-Consistency Video Prediction

Abstract

Results

Related Papers

Exploring Spatial-Temporal Multi-Frequency Analysis for High-Fidelity and Temporal-Consistency Video Prediction

Abstract

Results

Related Papers