TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Diffusion Models for Video Prediction and Infilling

Diffusion Models for Video Prediction and Infilling

Tobias Höppe, Arash Mehrjou, Stefan Bauer, Didrik Nielsen, Andrea Dittadi

2022-06-15Video PredictionPredictionVideo Generation
PaperPDFCode(official)

Abstract

Predicting and anticipating future outcomes or reasoning about missing information in a sequence are critical skills for agents to be able to make intelligent decisions. This requires strong, temporally coherent generative capabilities. Diffusion models have shown remarkable success in several generative tasks, but have not been extensively explored in the video domain. We present Random-Mask Video Diffusion (RaMViD), which extends image diffusion models to videos using 3D convolutions, and introduces a new conditioning technique during training. By varying the mask we condition on, the model is able to perform video prediction, infilling, and upsampling. Due to our simple conditioning scheme, we can utilize the same architecture as used for unconditional training, which allows us to train the model in a conditional and unconditional fashion at the same time. We evaluate RaMViD on two benchmark datasets for video prediction, on which we achieve state-of-the-art results, and one for video generation. High-resolution videos are provided at https://sites.google.com/view/video-diffusion-prediction.

Results

TaskDatasetMetricValueModel
VideoBAIR Robot PushingCond1RaMViD
VideoBAIR Robot PushingFVD score84.2RaMViD
VideoBAIR Robot PushingPred15RaMViD
VideoBAIR Robot PushingTrain20RaMViD
VideoKinetics-600 12 frames, 64x64Cond5RaMViD
VideoKinetics-600 12 frames, 64x64FVD16.46RaMViD
VideoKinetics-600 12 frames, 64x64Pred11RaMViD
Video PredictionKinetics-600 12 frames, 64x64Cond5RaMViD
Video PredictionKinetics-600 12 frames, 64x64FVD16.46RaMViD
Video PredictionKinetics-600 12 frames, 64x64Pred11RaMViD
Video GenerationBAIR Robot PushingCond1RaMViD
Video GenerationBAIR Robot PushingFVD score84.2RaMViD
Video GenerationBAIR Robot PushingPred15RaMViD
Video GenerationBAIR Robot PushingTrain20RaMViD

Related Papers

Multi-Strategy Improved Snake Optimizer Accelerated CNN-LSTM-Attention-Adaboost for Trajectory Prediction2025-07-21World Model-Based End-to-End Scene Generation for Accident Anticipation in Autonomous Driving2025-07-17Leveraging Pre-Trained Visual Models for AI-Generated Video Detection2025-07-17Taming Diffusion Transformer for Real-Time Mobile Video Generation2025-07-17LoViC: Efficient Long Video Generation with Context Compression2025-07-17Generative Click-through Rate Prediction with Applications to Search Advertising2025-07-15$I^{2}$-World: Intra-Inter Tokenization for Efficient Dynamic 4D Scene Forecasting2025-07-12Conformation-Aware Structure Prediction of Antigen-Recognizing Immune Proteins2025-07-11