TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Generating Videos with Scene Dynamics

Generating Videos with Scene Dynamics

Carl Vondrick, Hamed Pirsiavash, Antonio Torralba

2016-09-08NeurIPS 2016 12Action ClassificationRepresentation LearningVideo RecognitionFuture predictionGeneral ClassificationVideo UnderstandingVideo GenerationSelf-Supervised Action Recognition
PaperPDF

Abstract

We capitalize on large amounts of unlabeled video in order to learn a model of scene dynamics for both video recognition tasks (e.g. action classification) and video generation tasks (e.g. future prediction). We propose a generative adversarial network for video with a spatio-temporal convolutional architecture that untangles the scene's foreground from the background. Experiments suggest this model can generate tiny videos up to a second at full frame rate better than simple baselines, and we show its utility at predicting plausible futures of static images. Moreover, experiments and visualizations show the model internally learns useful features for recognizing actions with minimal supervision, suggesting scene dynamics are a promising signal for representation learning. We believe generative video models can impact many applications in video understanding and simulation.

Results

TaskDatasetMetricValueModel
VideoUCF-101 16 frames, Unconditional, Single GPUInception Score8.18VGAN
VideoUCF-101 16 frames, 64x64, UnconditionalInception Score8.18VGAN
Activity RecognitionUCF1013-fold Accuracy52.1VideoGan (C3D)
Action RecognitionUCF1013-fold Accuracy52.1VideoGan (C3D)
Video GenerationUCF-101 16 frames, Unconditional, Single GPUInception Score8.18VGAN
Video GenerationUCF-101 16 frames, 64x64, UnconditionalInception Score8.18VGAN

Related Papers

Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper2025-07-20Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17Boosting Team Modeling through Tempo-Relational Representation Learning2025-07-17VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding2025-07-17World Model-Based End-to-End Scene Generation for Accident Anticipation in Autonomous Driving2025-07-17Leveraging Pre-Trained Visual Models for AI-Generated Video Detection2025-07-17Taming Diffusion Transformer for Real-Time Mobile Video Generation2025-07-17LoViC: Efficient Long Video Generation with Context Compression2025-07-17