VideoFlow: A Conditional Flow-Based Model for Stochastic Video Generation

Manoj Kumar, Mohammad Babaeizadeh, Dumitru Erhan, Chelsea Finn, Sergey Levine, Laurent Dinh, Durk Kingma

2019-03-04ICLR 2020 1Video Prediction Predict Future Video Frames Video Generation

Abstract

Generative models that can model and predict sequences of future events can, in principle, learn to capture complex real-world phenomena, such as physical interactions. However, a central challenge in video prediction is that the future is highly uncertain: a sequence of past observations of events can imply many possible futures. Although a number of recent works have studied probabilistic models that can represent uncertain futures, such models are either extremely expensive computationally as in the case of pixel-level autoregressive models, or do not directly optimize the likelihood of the data. To our knowledge, our work is the first to propose multi-frame video prediction with normalizing flows, which allows for direct optimization of the data likelihood, and produces high-quality stochastic predictions. We describe an approach for modeling the latent space dynamics, and demonstrate that flow-based generative models offer a viable and competitive approach to generative modelling of video.

Results

Task	Dataset	Metric	Value	Model
Video	BAIR Robot Pushing	Cond	3	VideoFlow
Video	BAIR Robot Pushing	Train	10	VideoFlow
Video Generation	BAIR Robot Pushing	Cond	3	VideoFlow
Video Generation	BAIR Robot Pushing	Train	10	VideoFlow

Related Papers

World Model-Based End-to-End Scene Generation for Accident Anticipation in Autonomous Driving2025-07-17 Leveraging Pre-Trained Visual Models for AI-Generated Video Detection2025-07-17 Taming Diffusion Transformer for Real-Time Mobile Video Generation2025-07-17 LoViC: Efficient Long Video Generation with Context Compression2025-07-17 $I^{2}$-World: Intra-Inter Tokenization for Efficient Dynamic 4D Scene Forecasting2025-07-12 Lumos-1: On Autoregressive Video Generation from a Unified Model Perspective2025-07-11 Scaling RL to Long Videos2025-07-10 Martian World Models: Controllable Video Synthesis with Physically Accurate 3D Reconstructions2025-07-10