TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Long Video Generation with Time-Agnostic VQGAN and Time-Se...

Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer

Songwei Ge, Thomas Hayes, Harry Yang, Xi Yin, Guan Pang, David Jacobs, Jia-Bin Huang, Devi Parikh

2022-04-07Video Generation
PaperPDFCode(official)

Abstract

Videos are created to express emotion, exchange information, and share experiences. Video synthesis has intrigued researchers for a long time. Despite the rapid progress driven by advances in visual synthesis, most existing studies focus on improving the frames' quality and the transitions between them, while little progress has been made in generating longer videos. In this paper, we present a method that builds on 3D-VQGAN and transformers to generate videos with thousands of frames. Our evaluation shows that our model trained on 16-frame video clips from standard benchmarks such as UCF-101, Sky Time-lapse, and Taichi-HD datasets can generate diverse, coherent, and high-quality long videos. We also showcase conditional extensions of our approach for generating meaningful long videos by incorporating temporal information with text and audio. Videos and code can be found at https://songweige.github.io/projects/tats/index.html.

Results

TaskDatasetMetricValueModel
VideoUCF-101FVD16332TATS (128x128, class-conditional)
VideoUCF-101Inception Score79.28TATS (128x128, class-conditional)
VideoUCF-101FVD16420TATS (128x128, unconditional)
VideoUCF-101Inception Score57.63TATS (128x128, unconditional)
VideoUCF-101FVD16635TATS (256x256)
VideoUCF-101KVD1655TATS (256x256)
Video GenerationUCF-101FVD16332TATS (128x128, class-conditional)
Video GenerationUCF-101Inception Score79.28TATS (128x128, class-conditional)
Video GenerationUCF-101FVD16420TATS (128x128, unconditional)
Video GenerationUCF-101Inception Score57.63TATS (128x128, unconditional)
Video GenerationUCF-101FVD16635TATS (256x256)
Video GenerationUCF-101KVD1655TATS (256x256)

Related Papers

World Model-Based End-to-End Scene Generation for Accident Anticipation in Autonomous Driving2025-07-17Leveraging Pre-Trained Visual Models for AI-Generated Video Detection2025-07-17Taming Diffusion Transformer for Real-Time Mobile Video Generation2025-07-17LoViC: Efficient Long Video Generation with Context Compression2025-07-17$I^{2}$-World: Intra-Inter Tokenization for Efficient Dynamic 4D Scene Forecasting2025-07-12Lumos-1: On Autoregressive Video Generation from a Unified Model Perspective2025-07-11Scaling RL to Long Videos2025-07-10Martian World Models: Controllable Video Synthesis with Physically Accurate 3D Reconstructions2025-07-10