TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Improved Conditional VRNNs for Video Prediction

Improved Conditional VRNNs for Video Prediction

Lluis Castrejon, Nicolas Ballas, Aaron Courville

2019-04-27ICCV 2019 10Video PredictionPredictionVideo Generation
PaperPDFCode

Abstract

Predicting future frames for a video sequence is a challenging generative modeling task. Promising approaches include probabilistic latent variable models such as the Variational Auto-Encoder. While VAEs can handle uncertainty and model multiple possible future outcomes, they have a tendency to produce blurry predictions. In this work we argue that this is a sign of underfitting. To address this issue, we propose to increase the expressiveness of the latent distributions and to use higher capacity likelihood models. Our approach relies on a hierarchy of latent variables, which defines a family of flexible prior and posterior distributions in order to better model the probability of future sequences. We validate our proposal through a series of ablation experiments and compare our approach to current state-of-the-art latent variable models. Our method performs favorably under several metrics in three different datasets.

Results

TaskDatasetMetricValueModel
VideoBAIR Robot PushingCond2Hier-VRNN
VideoBAIR Robot PushingFVD score143.4Hier-VRNN
VideoBAIR Robot PushingPred28Hier-VRNN
VideoBAIR Robot PushingTrain10Hier-VRNN
VideoBAIR Robot PushingCond2VRNN 1L
VideoBAIR Robot PushingFVD score149.22VRNN 1L
VideoBAIR Robot PushingPred28VRNN 1L
VideoBAIR Robot PushingTrain10VRNN 1L
VideoCityscapes 128x128Cond.2Hier-VRNN
VideoCityscapes 128x128FVD567.51Hier-VRNN
VideoCityscapes 128x128Pred28Hier-VRNN
VideoCityscapes 128x128Train10Hier-VRNN
Video PredictionCityscapes 128x128Cond.2Hier-VRNN
Video PredictionCityscapes 128x128FVD567.51Hier-VRNN
Video PredictionCityscapes 128x128Pred28Hier-VRNN
Video PredictionCityscapes 128x128Train10Hier-VRNN
Video GenerationBAIR Robot PushingCond2Hier-VRNN
Video GenerationBAIR Robot PushingFVD score143.4Hier-VRNN
Video GenerationBAIR Robot PushingPred28Hier-VRNN
Video GenerationBAIR Robot PushingTrain10Hier-VRNN
Video GenerationBAIR Robot PushingCond2VRNN 1L
Video GenerationBAIR Robot PushingFVD score149.22VRNN 1L
Video GenerationBAIR Robot PushingPred28VRNN 1L
Video GenerationBAIR Robot PushingTrain10VRNN 1L

Related Papers

Multi-Strategy Improved Snake Optimizer Accelerated CNN-LSTM-Attention-Adaboost for Trajectory Prediction2025-07-21World Model-Based End-to-End Scene Generation for Accident Anticipation in Autonomous Driving2025-07-17Leveraging Pre-Trained Visual Models for AI-Generated Video Detection2025-07-17Taming Diffusion Transformer for Real-Time Mobile Video Generation2025-07-17LoViC: Efficient Long Video Generation with Context Compression2025-07-17Generative Click-through Rate Prediction with Applications to Search Advertising2025-07-15$I^{2}$-World: Intra-Inter Tokenization for Efficient Dynamic 4D Scene Forecasting2025-07-12Conformation-Aware Structure Prediction of Antigen-Recognizing Immune Proteins2025-07-11