TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Stochastic Video Generation with a Learned Prior

Stochastic Video Generation with a Learned Prior

Emily Denton, Rob Fergus

2018-02-21ICML 2018 7Video PredictionVideo Generation
PaperPDFCodeCode(official)Code

Abstract

Generating video frames that accurately predict future world states is challenging. Existing approaches either fail to capture the full distribution of outcomes, or yield blurry generations, or both. In this paper we introduce an unsupervised video generation model that learns a prior model of uncertainty in a given environment. Video frames are generated by drawing samples from this prior and combining them with a deterministic estimate of the future frame. The approach is simple and easily trained end-to-end on a variety of datasets. Sample generations are both varied and sharp, even many frames into the future, and compare favorably to those from existing approaches.

Results

TaskDatasetMetricValueModel
VideoBAIR Robot PushingCond2SVG (from SRVP)
VideoBAIR Robot PushingPred28SVG (from SRVP)
VideoBAIR Robot PushingTrain12SVG (from SRVP)
VideoBAIR Robot PushingCond2SVG-LP (from vRNN)
VideoBAIR Robot PushingFVD score256.62SVG-LP (from vRNN)
VideoBAIR Robot PushingPred28SVG-LP (from vRNN)
VideoBAIR Robot PushingTrain10SVG-LP (from vRNN)
VideoBAIR Robot PushingCond2SVG-FP (from FVD)
VideoBAIR Robot PushingFVD score315.5SVG-FP (from FVD)
VideoBAIR Robot PushingPred14SVG-FP (from FVD)
VideoBAIR Robot PushingTrain14SVG-FP (from FVD)
VideoKTHCond10SVG-LP (from Grid-keypoints)
VideoKTHFVD157.9SVG-LP (from Grid-keypoints)
VideoKTHLPIPS0.129SVG-LP (from Grid-keypoints)
VideoKTHPSNR23.91SVG-LP (from Grid-keypoints)
VideoKTHParams (M)22.8SVG-LP (from Grid-keypoints)
VideoKTHPred40SVG-LP (from Grid-keypoints)
VideoKTHSSIM0.8SVG-LP (from Grid-keypoints)
VideoKTHTrain10SVG-LP (from Grid-keypoints)
VideoKTHCond10SVG-LP (from SRVP)
VideoKTHPred30SVG-LP (from SRVP)
VideoKTHTrain10SVG-LP (from SRVP)
VideoSynpickVPLPIPS0.066SVG-LP
VideoSynpickVPMSE51.82SVG-LP
VideoSynpickVPSSIM0.886SVG-LP
VideoSynpickVPLPIPS0.068SVG-Det
VideoSynpickVPMSE60.6SVG-Det
VideoSynpickVPPSNR26.92SVG-Det
VideoSynpickVPSSIM0.879SVG-Det
VideoCityscapes 128x128Cond.2SVG (from Hier-VRNN)
VideoCityscapes 128x128FVD1300.26SVG (from Hier-VRNN)
VideoCityscapes 128x128Pred28SVG (from Hier-VRNN)
VideoCityscapes 128x128Train10SVG (from Hier-VRNN)
Video PredictionKTHCond10SVG-LP (from Grid-keypoints)
Video PredictionKTHFVD157.9SVG-LP (from Grid-keypoints)
Video PredictionKTHLPIPS0.129SVG-LP (from Grid-keypoints)
Video PredictionKTHPSNR23.91SVG-LP (from Grid-keypoints)
Video PredictionKTHParams (M)22.8SVG-LP (from Grid-keypoints)
Video PredictionKTHPred40SVG-LP (from Grid-keypoints)
Video PredictionKTHSSIM0.8SVG-LP (from Grid-keypoints)
Video PredictionKTHTrain10SVG-LP (from Grid-keypoints)
Video PredictionKTHCond10SVG-LP (from SRVP)
Video PredictionKTHPred30SVG-LP (from SRVP)
Video PredictionKTHTrain10SVG-LP (from SRVP)
Video PredictionSynpickVPLPIPS0.066SVG-LP
Video PredictionSynpickVPMSE51.82SVG-LP
Video PredictionSynpickVPSSIM0.886SVG-LP
Video PredictionSynpickVPLPIPS0.068SVG-Det
Video PredictionSynpickVPMSE60.6SVG-Det
Video PredictionSynpickVPPSNR26.92SVG-Det
Video PredictionSynpickVPSSIM0.879SVG-Det
Video PredictionCityscapes 128x128Cond.2SVG (from Hier-VRNN)
Video PredictionCityscapes 128x128FVD1300.26SVG (from Hier-VRNN)
Video PredictionCityscapes 128x128Pred28SVG (from Hier-VRNN)
Video PredictionCityscapes 128x128Train10SVG (from Hier-VRNN)
Video GenerationBAIR Robot PushingCond2SVG (from SRVP)
Video GenerationBAIR Robot PushingPred28SVG (from SRVP)
Video GenerationBAIR Robot PushingTrain12SVG (from SRVP)
Video GenerationBAIR Robot PushingCond2SVG-LP (from vRNN)
Video GenerationBAIR Robot PushingFVD score256.62SVG-LP (from vRNN)
Video GenerationBAIR Robot PushingPred28SVG-LP (from vRNN)
Video GenerationBAIR Robot PushingTrain10SVG-LP (from vRNN)
Video GenerationBAIR Robot PushingCond2SVG-FP (from FVD)
Video GenerationBAIR Robot PushingFVD score315.5SVG-FP (from FVD)
Video GenerationBAIR Robot PushingPred14SVG-FP (from FVD)
Video GenerationBAIR Robot PushingTrain14SVG-FP (from FVD)

Related Papers

World Model-Based End-to-End Scene Generation for Accident Anticipation in Autonomous Driving2025-07-17Leveraging Pre-Trained Visual Models for AI-Generated Video Detection2025-07-17Taming Diffusion Transformer for Real-Time Mobile Video Generation2025-07-17LoViC: Efficient Long Video Generation with Context Compression2025-07-17$I^{2}$-World: Intra-Inter Tokenization for Efficient Dynamic 4D Scene Forecasting2025-07-12Lumos-1: On Autoregressive Video Generation from a Unified Model Perspective2025-07-11Scaling RL to Long Videos2025-07-10Martian World Models: Controllable Video Synthesis with Physically Accurate 3D Reconstructions2025-07-10