TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Stochastic Adversarial Video Prediction

Stochastic Adversarial Video Prediction

Alex X. Lee, Richard Zhang, Frederik Ebert, Pieter Abbeel, Chelsea Finn, Sergey Levine

2018-04-04ICLR 2019 5Representation LearningVideo PredictionPredictionVideo Generation
PaperPDFCodeCode(official)CodeCode

Abstract

Being able to predict what may happen in the future requires an in-depth understanding of the physical and causal rules that govern the world. A model that is able to do so has a number of appealing applications, from robotic planning to representation learning. However, learning to predict raw future observations, such as frames in a video, is exceedingly challenging -- the ambiguous nature of the problem can cause a naively designed model to average together possible futures into a single, blurry prediction. Recently, this has been addressed by two distinct approaches: (a) latent variational variable models that explicitly model underlying stochasticity and (b) adversarially-trained models that aim to produce naturalistic images. However, a standard latent variable model can struggle to produce realistic results, and a standard adversarially-trained model underutilizes latent variables and fails to produce diverse predictions. We show that these distinct methods are in fact complementary. Combining the two produces predictions that look more realistic to human raters and better cover the range of possible futures. Our method outperforms prior and concurrent work in these aspects.

Results

TaskDatasetMetricValueModel
VideoBAIR Robot PushingCond2SAVP (from FVD)
VideoBAIR Robot PushingFVD score116.4SAVP (from FVD)
VideoBAIR Robot PushingPred14SAVP (from FVD)
VideoBAIR Robot PushingTrain14SAVP (from FVD)
VideoBAIR Robot PushingCond2SAVP (from vRNN)
VideoBAIR Robot PushingFVD score143.43SAVP (from vRNN)
VideoBAIR Robot PushingPred28SAVP (from vRNN)
VideoBAIR Robot PushingTrain10SAVP (from vRNN)
VideoBAIR Robot PushingCond2SAVP (from SRVP)
VideoBAIR Robot PushingPred28SAVP (from SRVP)
VideoBAIR Robot PushingTrain12SAVP (from SRVP)
VideoBAIR Robot PushingCond2SAVP-VAE (from WAM)
VideoBAIR Robot PushingPSNR19.09SAVP-VAE (from WAM)
VideoBAIR Robot PushingPred28SAVP-VAE (from WAM)
VideoBAIR Robot PushingSSIM0.815SAVP-VAE (from WAM)
VideoBAIR Robot PushingTrain14SAVP-VAE (from WAM)
VideoKTHCond10SAVP-VAE (from Grid-keypoints)
VideoKTHFVD145.7SAVP-VAE (from Grid-keypoints)
VideoKTHLPIPS0.116SAVP-VAE (from Grid-keypoints)
VideoKTHPSNR26SAVP-VAE (from Grid-keypoints)
VideoKTHParams (M)7.3SAVP-VAE (from Grid-keypoints)
VideoKTHPred40SAVP-VAE (from Grid-keypoints)
VideoKTHSSIM0.806SAVP-VAE (from Grid-keypoints)
VideoKTHTrain10SAVP-VAE (from Grid-keypoints)
VideoKTHCond10SAVP (from Grid-keypoints)
VideoKTHFVD183.7SAVP (from Grid-keypoints)
VideoKTHLPIPS0.126SAVP (from Grid-keypoints)
VideoKTHPSNR23.79SAVP (from Grid-keypoints)
VideoKTHParams (M)17.6SAVP (from Grid-keypoints)
VideoKTHPred40SAVP (from Grid-keypoints)
VideoKTHSSIM0.699SAVP (from Grid-keypoints)
VideoKTHTrain10SAVP (from Grid-keypoints)
VideoKTHCond10SAVP (from SRVP)
VideoKTHPred30SAVP (from SRVP)
VideoKTHTrain10SAVP (from SRVP)
VideoKTHCond10SAVP-VAE
VideoKTHPSNR27.77SAVP-VAE
VideoKTHPred20SAVP-VAE
VideoKTHSSIM0.852SAVP-VAE
Video PredictionKTHCond10SAVP-VAE (from Grid-keypoints)
Video PredictionKTHFVD145.7SAVP-VAE (from Grid-keypoints)
Video PredictionKTHLPIPS0.116SAVP-VAE (from Grid-keypoints)
Video PredictionKTHPSNR26SAVP-VAE (from Grid-keypoints)
Video PredictionKTHParams (M)7.3SAVP-VAE (from Grid-keypoints)
Video PredictionKTHPred40SAVP-VAE (from Grid-keypoints)
Video PredictionKTHSSIM0.806SAVP-VAE (from Grid-keypoints)
Video PredictionKTHTrain10SAVP-VAE (from Grid-keypoints)
Video PredictionKTHCond10SAVP (from Grid-keypoints)
Video PredictionKTHFVD183.7SAVP (from Grid-keypoints)
Video PredictionKTHLPIPS0.126SAVP (from Grid-keypoints)
Video PredictionKTHPSNR23.79SAVP (from Grid-keypoints)
Video PredictionKTHParams (M)17.6SAVP (from Grid-keypoints)
Video PredictionKTHPred40SAVP (from Grid-keypoints)
Video PredictionKTHSSIM0.699SAVP (from Grid-keypoints)
Video PredictionKTHTrain10SAVP (from Grid-keypoints)
Video PredictionKTHCond10SAVP (from SRVP)
Video PredictionKTHPred30SAVP (from SRVP)
Video PredictionKTHTrain10SAVP (from SRVP)
Video PredictionKTHCond10SAVP-VAE
Video PredictionKTHPSNR27.77SAVP-VAE
Video PredictionKTHPred20SAVP-VAE
Video PredictionKTHSSIM0.852SAVP-VAE
Video GenerationBAIR Robot PushingCond2SAVP (from FVD)
Video GenerationBAIR Robot PushingFVD score116.4SAVP (from FVD)
Video GenerationBAIR Robot PushingPred14SAVP (from FVD)
Video GenerationBAIR Robot PushingTrain14SAVP (from FVD)
Video GenerationBAIR Robot PushingCond2SAVP (from vRNN)
Video GenerationBAIR Robot PushingFVD score143.43SAVP (from vRNN)
Video GenerationBAIR Robot PushingPred28SAVP (from vRNN)
Video GenerationBAIR Robot PushingTrain10SAVP (from vRNN)
Video GenerationBAIR Robot PushingCond2SAVP (from SRVP)
Video GenerationBAIR Robot PushingPred28SAVP (from SRVP)
Video GenerationBAIR Robot PushingTrain12SAVP (from SRVP)
Video GenerationBAIR Robot PushingCond2SAVP-VAE (from WAM)
Video GenerationBAIR Robot PushingPSNR19.09SAVP-VAE (from WAM)
Video GenerationBAIR Robot PushingPred28SAVP-VAE (from WAM)
Video GenerationBAIR Robot PushingSSIM0.815SAVP-VAE (from WAM)
Video GenerationBAIR Robot PushingTrain14SAVP-VAE (from WAM)

Related Papers

Multi-Strategy Improved Snake Optimizer Accelerated CNN-LSTM-Attention-Adaboost for Trajectory Prediction2025-07-21Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper2025-07-20Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17Boosting Team Modeling through Tempo-Relational Representation Learning2025-07-17World Model-Based End-to-End Scene Generation for Accident Anticipation in Autonomous Driving2025-07-17Leveraging Pre-Trained Visual Models for AI-Generated Video Detection2025-07-17Taming Diffusion Transformer for Real-Time Mobile Video Generation2025-07-17LoViC: Efficient Long Video Generation with Context Compression2025-07-17