TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Video Pixel Networks

Video Pixel Networks

Nal Kalchbrenner, Aaron van den Oord, Karen Simonyan, Ivo Danihelka, Oriol Vinyals, Alex Graves, Koray Kavukcuoglu

2016-10-03ICML 2017 8Video Prediction
PaperPDFCode

Abstract

We propose a probabilistic video model, the Video Pixel Network (VPN), that estimates the discrete joint distribution of the raw pixel values in a video. The model and the neural architecture reflect the time, space and color structure of video tensors and encode it as a four-dimensional dependency chain. The VPN approaches the best possible performance on the Moving MNIST benchmark, a leap over the previous state of the art, and the generated videos show only minor deviations from the ground truth. The VPN also produces detailed samples on the action-conditional Robotic Pushing benchmark and generalizes to the motion of novel objects.

Results

TaskDatasetMetricValueModel
VideoKTHCond10VPN
VideoKTHPSNR23.76VPN
VideoKTHPred20VPN
VideoKTHSSIM0.746VPN
Video PredictionKTHCond10VPN
Video PredictionKTHPSNR23.76VPN
Video PredictionKTHPred20VPN
Video PredictionKTHSSIM0.746VPN

Related Papers

Epona: Autoregressive Diffusion World Model for Autonomous Driving2025-06-30Whole-Body Conditioned Egocentric Video Prediction2025-06-26MinD: Unified Visual Imagination and Control via Hierarchical World Models2025-06-23AMPLIFY: Actionless Motion Priors for Robot Learning from Videos2025-06-17Towards a Generalizable Bimanual Foundation Policy via Flow-based Video Prediction2025-05-30Autoregression-free video prediction using diffusion model for mitigating error propagation2025-05-28Consistent World Models via Foresight Diffusion2025-05-22Programmatic Video Prediction Using Large Language Models2025-05-20