Stochastic Variational Video Prediction

Mohammad Babaeizadeh, Chelsea Finn, Dumitru Erhan, Roy H. Campbell, Sergey Levine

2017-10-30ICLR 2018 1Video Prediction Prediction Video Generation

Abstract

Predicting the future in real-world settings, particularly from raw sensory observations such as images, is exceptionally challenging. Real-world events can be stochastic and unpredictable, and the high dimensionality and complexity of natural images requires the predictive model to build an intricate understanding of the natural world. Many existing methods tackle this problem by making simplifying assumptions about the environment. One common assumption is that the outcome is deterministic and there is only one plausible future. This can lead to low-quality predictions in real-world settings with stochastic dynamics. In this paper, we develop a stochastic variational video prediction (SV2P) method that predicts a different possible future for each sample of its latent variables. To the best of our knowledge, our model is the first to provide effective stochastic multi-frame prediction for real-world video. We demonstrate the capability of the proposed method in predicting detailed future frames of videos on multiple real-world datasets, both action-free and action-conditioned. We find that our proposed method produces substantially improved video predictions when compared to the same model without stochasticity, and to other stochastic video prediction methods. Our SV2P implementation will be open sourced upon publication.

Results

Task	Dataset	Metric	Value	Model
Video	BAIR Robot Pushing	Cond	2	SV2P (from FVD)
Video	BAIR Robot Pushing	FVD score	262.5	SV2P (from FVD)
Video	BAIR Robot Pushing	Pred	14	SV2P (from FVD)
Video	BAIR Robot Pushing	Train	14	SV2P (from FVD)
Video	BAIR Robot Pushing	Cond	2	SV2P (from SRVP)
Video	BAIR Robot Pushing	Pred	28	SV2P (from SRVP)
Video	BAIR Robot Pushing	Train	12	SV2P (from SRVP)
Video	KTH	Cond	10	SV2P time-invariant (from Grid-keypoints)
Video	KTH	FVD	209.5	SV2P time-invariant (from Grid-keypoints)
Video	KTH	LPIPS	0.232	SV2P time-invariant (from Grid-keypoints)
Video	KTH	PSNR	25.87	SV2P time-invariant (from Grid-keypoints)
Video	KTH	Params (M)	8.3	SV2P time-invariant (from Grid-keypoints)
Video	KTH	Pred	40	SV2P time-invariant (from Grid-keypoints)
Video	KTH	SSIM	0.782	SV2P time-invariant (from Grid-keypoints)
Video	KTH	Train	10	SV2P time-invariant (from Grid-keypoints)
Video	KTH	Cond	10	SV2P time-invariant (from Grid-keypoints)
Video	KTH	FVD	253.5	SV2P time-invariant (from Grid-keypoints)
Video	KTH	LPIPS	0.26	SV2P time-invariant (from Grid-keypoints)
Video	KTH	PSNR	25.7	SV2P time-invariant (from Grid-keypoints)
Video	KTH	Params (M)	8.3	SV2P time-invariant (from Grid-keypoints)
Video	KTH	Pred	40	SV2P time-invariant (from Grid-keypoints)
Video	KTH	SSIM	0.772	SV2P time-invariant (from Grid-keypoints)
Video	KTH	Train	10	SV2P time-invariant (from Grid-keypoints)
Video	KTH	Cond	10	SV2P (from SRVP)
Video	KTH	Pred	30	SV2P (from SRVP)
Video	KTH	SSIM	0.838	SV2P (from SRVP)
Video	KTH	Train	10	SV2P (from SRVP)
Video Prediction	KTH	Cond	10	SV2P time-invariant (from Grid-keypoints)
Video Prediction	KTH	FVD	209.5	SV2P time-invariant (from Grid-keypoints)
Video Prediction	KTH	LPIPS	0.232	SV2P time-invariant (from Grid-keypoints)
Video Prediction	KTH	PSNR	25.87	SV2P time-invariant (from Grid-keypoints)
Video Prediction	KTH	Params (M)	8.3	SV2P time-invariant (from Grid-keypoints)
Video Prediction	KTH	Pred	40	SV2P time-invariant (from Grid-keypoints)
Video Prediction	KTH	SSIM	0.782	SV2P time-invariant (from Grid-keypoints)
Video Prediction	KTH	Train	10	SV2P time-invariant (from Grid-keypoints)
Video Prediction	KTH	Cond	10	SV2P time-invariant (from Grid-keypoints)
Video Prediction	KTH	FVD	253.5	SV2P time-invariant (from Grid-keypoints)
Video Prediction	KTH	LPIPS	0.26	SV2P time-invariant (from Grid-keypoints)
Video Prediction	KTH	PSNR	25.7	SV2P time-invariant (from Grid-keypoints)
Video Prediction	KTH	Params (M)	8.3	SV2P time-invariant (from Grid-keypoints)
Video Prediction	KTH	Pred	40	SV2P time-invariant (from Grid-keypoints)
Video Prediction	KTH	SSIM	0.772	SV2P time-invariant (from Grid-keypoints)
Video Prediction	KTH	Train	10	SV2P time-invariant (from Grid-keypoints)
Video Prediction	KTH	Cond	10	SV2P (from SRVP)
Video Prediction	KTH	Pred	30	SV2P (from SRVP)
Video Prediction	KTH	SSIM	0.838	SV2P (from SRVP)
Video Prediction	KTH	Train	10	SV2P (from SRVP)
Video Generation	BAIR Robot Pushing	Cond	2	SV2P (from FVD)
Video Generation	BAIR Robot Pushing	FVD score	262.5	SV2P (from FVD)
Video Generation	BAIR Robot Pushing	Pred	14	SV2P (from FVD)
Video Generation	BAIR Robot Pushing	Train	14	SV2P (from FVD)
Video Generation	BAIR Robot Pushing	Cond	2	SV2P (from SRVP)
Video Generation	BAIR Robot Pushing	Pred	28	SV2P (from SRVP)
Video Generation	BAIR Robot Pushing	Train	12	SV2P (from SRVP)

Abstract

Results

Task	Dataset	Metric	Value	Model
Video	BAIR Robot Pushing	Cond	2	SV2P (from FVD)
Video	BAIR Robot Pushing	FVD score	262.5	SV2P (from FVD)
Video	BAIR Robot Pushing	Pred	14	SV2P (from FVD)
Video	BAIR Robot Pushing	Train	14	SV2P (from FVD)
Video	BAIR Robot Pushing	Cond	2	SV2P (from SRVP)
Video	BAIR Robot Pushing	Pred	28	SV2P (from SRVP)
Video	BAIR Robot Pushing	Train	12	SV2P (from SRVP)
Video	KTH	Cond	10	SV2P time-invariant (from Grid-keypoints)
Video	KTH	FVD	209.5	SV2P time-invariant (from Grid-keypoints)
Video	KTH	LPIPS	0.232	SV2P time-invariant (from Grid-keypoints)
Video	KTH	PSNR	25.87	SV2P time-invariant (from Grid-keypoints)
Video	KTH	Params (M)	8.3	SV2P time-invariant (from Grid-keypoints)
Video	KTH	Pred	40	SV2P time-invariant (from Grid-keypoints)
Video	KTH	SSIM	0.782	SV2P time-invariant (from Grid-keypoints)
Video	KTH	Train	10	SV2P time-invariant (from Grid-keypoints)
Video	KTH	Cond	10	SV2P time-invariant (from Grid-keypoints)
Video	KTH	FVD	253.5	SV2P time-invariant (from Grid-keypoints)
Video	KTH	LPIPS	0.26	SV2P time-invariant (from Grid-keypoints)
Video	KTH	PSNR	25.7	SV2P time-invariant (from Grid-keypoints)
Video	KTH	Params (M)	8.3	SV2P time-invariant (from Grid-keypoints)
Video	KTH	Pred	40	SV2P time-invariant (from Grid-keypoints)
Video	KTH	SSIM	0.772	SV2P time-invariant (from Grid-keypoints)
Video	KTH	Train	10	SV2P time-invariant (from Grid-keypoints)
Video	KTH	Cond	10	SV2P (from SRVP)
Video	KTH	Pred	30	SV2P (from SRVP)
Video	KTH	SSIM	0.838	SV2P (from SRVP)
Video	KTH	Train	10	SV2P (from SRVP)
Video Prediction	KTH	Cond	10	SV2P time-invariant (from Grid-keypoints)
Video Prediction	KTH	FVD	209.5	SV2P time-invariant (from Grid-keypoints)
Video Prediction	KTH	LPIPS	0.232	SV2P time-invariant (from Grid-keypoints)
Video Prediction	KTH	PSNR	25.87	SV2P time-invariant (from Grid-keypoints)
Video Prediction	KTH	Params (M)	8.3	SV2P time-invariant (from Grid-keypoints)
Video Prediction	KTH	Pred	40	SV2P time-invariant (from Grid-keypoints)
Video Prediction	KTH	SSIM	0.782	SV2P time-invariant (from Grid-keypoints)
Video Prediction	KTH	Train	10	SV2P time-invariant (from Grid-keypoints)
Video Prediction	KTH	Cond	10	SV2P time-invariant (from Grid-keypoints)
Video Prediction	KTH	FVD	253.5	SV2P time-invariant (from Grid-keypoints)
Video Prediction	KTH	LPIPS	0.26	SV2P time-invariant (from Grid-keypoints)
Video Prediction	KTH	PSNR	25.7	SV2P time-invariant (from Grid-keypoints)
Video Prediction	KTH	Params (M)	8.3	SV2P time-invariant (from Grid-keypoints)
Video Prediction	KTH	Pred	40	SV2P time-invariant (from Grid-keypoints)
Video Prediction	KTH	SSIM	0.772	SV2P time-invariant (from Grid-keypoints)
Video Prediction	KTH	Train	10	SV2P time-invariant (from Grid-keypoints)
Video Prediction	KTH	Cond	10	SV2P (from SRVP)
Video Prediction	KTH	Pred	30	SV2P (from SRVP)
Video Prediction	KTH	SSIM	0.838	SV2P (from SRVP)
Video Prediction	KTH	Train	10	SV2P (from SRVP)
Video Generation	BAIR Robot Pushing	Cond	2	SV2P (from FVD)
Video Generation	BAIR Robot Pushing	FVD score	262.5	SV2P (from FVD)
Video Generation	BAIR Robot Pushing	Pred	14	SV2P (from FVD)
Video Generation	BAIR Robot Pushing	Train	14	SV2P (from FVD)
Video Generation	BAIR Robot Pushing	Cond	2	SV2P (from SRVP)
Video Generation	BAIR Robot Pushing	Pred	28	SV2P (from SRVP)
Video Generation	BAIR Robot Pushing	Train	12	SV2P (from SRVP)

Stochastic Variational Video Prediction

Abstract

Results

Related Papers

Stochastic Variational Video Prediction

Abstract

Results

Related Papers