Stochastic Video Generation with a Learned Prior

Emily Denton, Rob Fergus

2018-02-21ICML 2018 7Video Prediction Video Generation

Abstract

Generating video frames that accurately predict future world states is challenging. Existing approaches either fail to capture the full distribution of outcomes, or yield blurry generations, or both. In this paper we introduce an unsupervised video generation model that learns a prior model of uncertainty in a given environment. Video frames are generated by drawing samples from this prior and combining them with a deterministic estimate of the future frame. The approach is simple and easily trained end-to-end on a variety of datasets. Sample generations are both varied and sharp, even many frames into the future, and compare favorably to those from existing approaches.

Results

Task	Dataset	Metric	Value	Model
Video	BAIR Robot Pushing	Cond	2	SVG (from SRVP)
Video	BAIR Robot Pushing	Pred	28	SVG (from SRVP)
Video	BAIR Robot Pushing	Train	12	SVG (from SRVP)
Video	BAIR Robot Pushing	Cond	2	SVG-LP (from vRNN)
Video	BAIR Robot Pushing	FVD score	256.62	SVG-LP (from vRNN)
Video	BAIR Robot Pushing	Pred	28	SVG-LP (from vRNN)
Video	BAIR Robot Pushing	Train	10	SVG-LP (from vRNN)
Video	BAIR Robot Pushing	Cond	2	SVG-FP (from FVD)
Video	BAIR Robot Pushing	FVD score	315.5	SVG-FP (from FVD)
Video	BAIR Robot Pushing	Pred	14	SVG-FP (from FVD)
Video	BAIR Robot Pushing	Train	14	SVG-FP (from FVD)
Video	KTH	Cond	10	SVG-LP (from Grid-keypoints)
Video	KTH	FVD	157.9	SVG-LP (from Grid-keypoints)
Video	KTH	LPIPS	0.129	SVG-LP (from Grid-keypoints)
Video	KTH	PSNR	23.91	SVG-LP (from Grid-keypoints)
Video	KTH	Params (M)	22.8	SVG-LP (from Grid-keypoints)
Video	KTH	Pred	40	SVG-LP (from Grid-keypoints)
Video	KTH	SSIM	0.8	SVG-LP (from Grid-keypoints)
Video	KTH	Train	10	SVG-LP (from Grid-keypoints)
Video	KTH	Cond	10	SVG-LP (from SRVP)
Video	KTH	Pred	30	SVG-LP (from SRVP)
Video	KTH	Train	10	SVG-LP (from SRVP)
Video	SynpickVP	LPIPS	0.066	SVG-LP
Video	SynpickVP	MSE	51.82	SVG-LP
Video	SynpickVP	SSIM	0.886	SVG-LP
Video	SynpickVP	LPIPS	0.068	SVG-Det
Video	SynpickVP	MSE	60.6	SVG-Det
Video	SynpickVP	PSNR	26.92	SVG-Det
Video	SynpickVP	SSIM	0.879	SVG-Det
Video	Cityscapes 128x128	Cond.	2	SVG (from Hier-VRNN)
Video	Cityscapes 128x128	FVD	1300.26	SVG (from Hier-VRNN)
Video	Cityscapes 128x128	Pred	28	SVG (from Hier-VRNN)
Video	Cityscapes 128x128	Train	10	SVG (from Hier-VRNN)
Video Prediction	KTH	Cond	10	SVG-LP (from Grid-keypoints)
Video Prediction	KTH	FVD	157.9	SVG-LP (from Grid-keypoints)
Video Prediction	KTH	LPIPS	0.129	SVG-LP (from Grid-keypoints)
Video Prediction	KTH	PSNR	23.91	SVG-LP (from Grid-keypoints)
Video Prediction	KTH	Params (M)	22.8	SVG-LP (from Grid-keypoints)
Video Prediction	KTH	Pred	40	SVG-LP (from Grid-keypoints)
Video Prediction	KTH	SSIM	0.8	SVG-LP (from Grid-keypoints)
Video Prediction	KTH	Train	10	SVG-LP (from Grid-keypoints)
Video Prediction	KTH	Cond	10	SVG-LP (from SRVP)
Video Prediction	KTH	Pred	30	SVG-LP (from SRVP)
Video Prediction	KTH	Train	10	SVG-LP (from SRVP)
Video Prediction	SynpickVP	LPIPS	0.066	SVG-LP
Video Prediction	SynpickVP	MSE	51.82	SVG-LP
Video Prediction	SynpickVP	SSIM	0.886	SVG-LP
Video Prediction	SynpickVP	LPIPS	0.068	SVG-Det
Video Prediction	SynpickVP	MSE	60.6	SVG-Det
Video Prediction	SynpickVP	PSNR	26.92	SVG-Det
Video Prediction	SynpickVP	SSIM	0.879	SVG-Det
Video Prediction	Cityscapes 128x128	Cond.	2	SVG (from Hier-VRNN)
Video Prediction	Cityscapes 128x128	FVD	1300.26	SVG (from Hier-VRNN)
Video Prediction	Cityscapes 128x128	Pred	28	SVG (from Hier-VRNN)
Video Prediction	Cityscapes 128x128	Train	10	SVG (from Hier-VRNN)
Video Generation	BAIR Robot Pushing	Cond	2	SVG (from SRVP)
Video Generation	BAIR Robot Pushing	Pred	28	SVG (from SRVP)
Video Generation	BAIR Robot Pushing	Train	12	SVG (from SRVP)
Video Generation	BAIR Robot Pushing	Cond	2	SVG-LP (from vRNN)
Video Generation	BAIR Robot Pushing	FVD score	256.62	SVG-LP (from vRNN)
Video Generation	BAIR Robot Pushing	Pred	28	SVG-LP (from vRNN)
Video Generation	BAIR Robot Pushing	Train	10	SVG-LP (from vRNN)
Video Generation	BAIR Robot Pushing	Cond	2	SVG-FP (from FVD)
Video Generation	BAIR Robot Pushing	FVD score	315.5	SVG-FP (from FVD)
Video Generation	BAIR Robot Pushing	Pred	14	SVG-FP (from FVD)
Video Generation	BAIR Robot Pushing	Train	14	SVG-FP (from FVD)

Stochastic Video Generation with a Learned Prior

Abstract

Results

Related Papers

Stochastic Video Generation with a Learned Prior

Abstract

Results

Related Papers