TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Stochastic Variational Video Prediction

Stochastic Variational Video Prediction

Mohammad Babaeizadeh, Chelsea Finn, Dumitru Erhan, Roy H. Campbell, Sergey Levine

2017-10-30ICLR 2018 1Video PredictionPredictionVideo Generation
PaperPDFCodeCodeCode

Abstract

Predicting the future in real-world settings, particularly from raw sensory observations such as images, is exceptionally challenging. Real-world events can be stochastic and unpredictable, and the high dimensionality and complexity of natural images requires the predictive model to build an intricate understanding of the natural world. Many existing methods tackle this problem by making simplifying assumptions about the environment. One common assumption is that the outcome is deterministic and there is only one plausible future. This can lead to low-quality predictions in real-world settings with stochastic dynamics. In this paper, we develop a stochastic variational video prediction (SV2P) method that predicts a different possible future for each sample of its latent variables. To the best of our knowledge, our model is the first to provide effective stochastic multi-frame prediction for real-world video. We demonstrate the capability of the proposed method in predicting detailed future frames of videos on multiple real-world datasets, both action-free and action-conditioned. We find that our proposed method produces substantially improved video predictions when compared to the same model without stochasticity, and to other stochastic video prediction methods. Our SV2P implementation will be open sourced upon publication.

Results

TaskDatasetMetricValueModel
VideoBAIR Robot PushingCond2SV2P (from FVD)
VideoBAIR Robot PushingFVD score262.5SV2P (from FVD)
VideoBAIR Robot PushingPred14SV2P (from FVD)
VideoBAIR Robot PushingTrain14SV2P (from FVD)
VideoBAIR Robot PushingCond2SV2P (from SRVP)
VideoBAIR Robot PushingPred28SV2P (from SRVP)
VideoBAIR Robot PushingTrain12SV2P (from SRVP)
VideoKTHCond10SV2P time-invariant (from Grid-keypoints)
VideoKTHFVD209.5SV2P time-invariant (from Grid-keypoints)
VideoKTHLPIPS0.232SV2P time-invariant (from Grid-keypoints)
VideoKTHPSNR25.87SV2P time-invariant (from Grid-keypoints)
VideoKTHParams (M)8.3SV2P time-invariant (from Grid-keypoints)
VideoKTHPred40SV2P time-invariant (from Grid-keypoints)
VideoKTHSSIM0.782SV2P time-invariant (from Grid-keypoints)
VideoKTHTrain10SV2P time-invariant (from Grid-keypoints)
VideoKTHCond10SV2P time-invariant (from Grid-keypoints)
VideoKTHFVD253.5SV2P time-invariant (from Grid-keypoints)
VideoKTHLPIPS0.26SV2P time-invariant (from Grid-keypoints)
VideoKTHPSNR25.7SV2P time-invariant (from Grid-keypoints)
VideoKTHParams (M)8.3SV2P time-invariant (from Grid-keypoints)
VideoKTHPred40SV2P time-invariant (from Grid-keypoints)
VideoKTHSSIM0.772SV2P time-invariant (from Grid-keypoints)
VideoKTHTrain10SV2P time-invariant (from Grid-keypoints)
VideoKTHCond10SV2P (from SRVP)
VideoKTHPred30SV2P (from SRVP)
VideoKTHSSIM0.838SV2P (from SRVP)
VideoKTHTrain10SV2P (from SRVP)
Video PredictionKTHCond10SV2P time-invariant (from Grid-keypoints)
Video PredictionKTHFVD209.5SV2P time-invariant (from Grid-keypoints)
Video PredictionKTHLPIPS0.232SV2P time-invariant (from Grid-keypoints)
Video PredictionKTHPSNR25.87SV2P time-invariant (from Grid-keypoints)
Video PredictionKTHParams (M)8.3SV2P time-invariant (from Grid-keypoints)
Video PredictionKTHPred40SV2P time-invariant (from Grid-keypoints)
Video PredictionKTHSSIM0.782SV2P time-invariant (from Grid-keypoints)
Video PredictionKTHTrain10SV2P time-invariant (from Grid-keypoints)
Video PredictionKTHCond10SV2P time-invariant (from Grid-keypoints)
Video PredictionKTHFVD253.5SV2P time-invariant (from Grid-keypoints)
Video PredictionKTHLPIPS0.26SV2P time-invariant (from Grid-keypoints)
Video PredictionKTHPSNR25.7SV2P time-invariant (from Grid-keypoints)
Video PredictionKTHParams (M)8.3SV2P time-invariant (from Grid-keypoints)
Video PredictionKTHPred40SV2P time-invariant (from Grid-keypoints)
Video PredictionKTHSSIM0.772SV2P time-invariant (from Grid-keypoints)
Video PredictionKTHTrain10SV2P time-invariant (from Grid-keypoints)
Video PredictionKTHCond10SV2P (from SRVP)
Video PredictionKTHPred30SV2P (from SRVP)
Video PredictionKTHSSIM0.838SV2P (from SRVP)
Video PredictionKTHTrain10SV2P (from SRVP)
Video GenerationBAIR Robot PushingCond2SV2P (from FVD)
Video GenerationBAIR Robot PushingFVD score262.5SV2P (from FVD)
Video GenerationBAIR Robot PushingPred14SV2P (from FVD)
Video GenerationBAIR Robot PushingTrain14SV2P (from FVD)
Video GenerationBAIR Robot PushingCond2SV2P (from SRVP)
Video GenerationBAIR Robot PushingPred28SV2P (from SRVP)
Video GenerationBAIR Robot PushingTrain12SV2P (from SRVP)

Related Papers

Multi-Strategy Improved Snake Optimizer Accelerated CNN-LSTM-Attention-Adaboost for Trajectory Prediction2025-07-21World Model-Based End-to-End Scene Generation for Accident Anticipation in Autonomous Driving2025-07-17Leveraging Pre-Trained Visual Models for AI-Generated Video Detection2025-07-17Taming Diffusion Transformer for Real-Time Mobile Video Generation2025-07-17LoViC: Efficient Long Video Generation with Context Compression2025-07-17Generative Click-through Rate Prediction with Applications to Search Advertising2025-07-15$I^{2}$-World: Intra-Inter Tokenization for Efficient Dynamic 4D Scene Forecasting2025-07-12Conformation-Aware Structure Prediction of Antigen-Recognizing Immune Proteins2025-07-11