TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/NUWA-Infinity: Autoregressive over Autoregressive Generati...

NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis

Chenfei Wu, Jian Liang, Xiaowei Hu, Zhe Gan, JianFeng Wang, Lijuan Wang, Zicheng Liu, Yuejian Fang, Nan Duan

2022-07-20Text-to-Image GenerationImage OutpaintingVideo Generation
PaperPDFCode(official)

Abstract

In this paper, we present NUWA-Infinity, a generative model for infinite visual synthesis, which is defined as the task of generating arbitrarily-sized high-resolution images or long-duration videos. An autoregressive over autoregressive generation mechanism is proposed to deal with this variable-size generation task, where a global patch-level autoregressive model considers the dependencies between patches, and a local token-level autoregressive model considers dependencies between visual tokens within each patch. A Nearby Context Pool (NCP) is introduced to cache-related patches already generated as the context for the current patch being generated, which can significantly save computation costs without sacrificing patch-level dependency modeling. An Arbitrary Direction Controller (ADC) is used to decide suitable generation orders for different visual synthesis tasks and learn order-aware positional embeddings. Compared to DALL-E, Imagen and Parti, NUWA-Infinity can generate high-resolution images with arbitrary sizes and support long-duration video generation additionally. Compared to NUWA, which also covers images and videos, NUWA-Infinity has superior visual synthesis capabilities in terms of resolution and variable-size generation. The GitHub link is https://github.com/microsoft/NUWA. The homepage link is https://nuwa-infinity.microsoft.com.

Results

TaskDatasetMetricValueModel
Image GenerationLHQCBlock-FID9.71NUWA-Infinity
Text-to-Image GenerationLHQCBlock-FID9.71NUWA-Infinity
Image OutpaintingLHQCBlock-FID (Right Extend)6.43NUWA-Infinity w/o text
Image OutpaintingLHQCBlock-FID (Down Extend)11.47NUWA-Infinity w/o text
Image OutpaintingLHQCBlock-FID (Left Extend)6.71NUWA-Infinity w/o text
Image OutpaintingLHQCBlock-FID (Up Extend)8.03NUWA-Infinity w/o text
Image OutpaintingLHQCBlock-FID (Right Extend)6.45NUWA-Infinity
Image OutpaintingLHQCBlock-FID (Down Extend)9.84NUWA-Infinity
Image OutpaintingLHQCBlock-FID (Left Extend)6.72NUWA-Infinity
Image OutpaintingLHQCBlock-FID (Up Extend)7.43NUWA-Infinity
10-shot image generationLHQCBlock-FID9.71NUWA-Infinity
1 Image, 2*2 StitchiLHQCBlock-FID9.71NUWA-Infinity

Related Papers

World Model-Based End-to-End Scene Generation for Accident Anticipation in Autonomous Driving2025-07-17Leveraging Pre-Trained Visual Models for AI-Generated Video Detection2025-07-17Taming Diffusion Transformer for Real-Time Mobile Video Generation2025-07-17LoViC: Efficient Long Video Generation with Context Compression2025-07-17CharaConsist: Fine-Grained Consistent Character Generation2025-07-15$I^{2}$-World: Intra-Inter Tokenization for Efficient Dynamic 4D Scene Forecasting2025-07-12Lumos-1: On Autoregressive Video Generation from a Unified Model Perspective2025-07-11Scaling RL to Long Videos2025-07-10