Chenfei Wu, Jian Liang, Xiaowei Hu, Zhe Gan, JianFeng Wang, Lijuan Wang, Zicheng Liu, Yuejian Fang, Nan Duan
In this paper, we present NUWA-Infinity, a generative model for infinite visual synthesis, which is defined as the task of generating arbitrarily-sized high-resolution images or long-duration videos. An autoregressive over autoregressive generation mechanism is proposed to deal with this variable-size generation task, where a global patch-level autoregressive model considers the dependencies between patches, and a local token-level autoregressive model considers dependencies between visual tokens within each patch. A Nearby Context Pool (NCP) is introduced to cache-related patches already generated as the context for the current patch being generated, which can significantly save computation costs without sacrificing patch-level dependency modeling. An Arbitrary Direction Controller (ADC) is used to decide suitable generation orders for different visual synthesis tasks and learn order-aware positional embeddings. Compared to DALL-E, Imagen and Parti, NUWA-Infinity can generate high-resolution images with arbitrary sizes and support long-duration video generation additionally. Compared to NUWA, which also covers images and videos, NUWA-Infinity has superior visual synthesis capabilities in terms of resolution and variable-size generation. The GitHub link is https://github.com/microsoft/NUWA. The homepage link is https://nuwa-infinity.microsoft.com.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Image Generation | LHQC | Block-FID | 9.71 | NUWA-Infinity |
| Text-to-Image Generation | LHQC | Block-FID | 9.71 | NUWA-Infinity |
| Image Outpainting | LHQC | Block-FID (Right Extend) | 6.43 | NUWA-Infinity w/o text |
| Image Outpainting | LHQC | Block-FID (Down Extend) | 11.47 | NUWA-Infinity w/o text |
| Image Outpainting | LHQC | Block-FID (Left Extend) | 6.71 | NUWA-Infinity w/o text |
| Image Outpainting | LHQC | Block-FID (Up Extend) | 8.03 | NUWA-Infinity w/o text |
| Image Outpainting | LHQC | Block-FID (Right Extend) | 6.45 | NUWA-Infinity |
| Image Outpainting | LHQC | Block-FID (Down Extend) | 9.84 | NUWA-Infinity |
| Image Outpainting | LHQC | Block-FID (Left Extend) | 6.72 | NUWA-Infinity |
| Image Outpainting | LHQC | Block-FID (Up Extend) | 7.43 | NUWA-Infinity |
| 10-shot image generation | LHQC | Block-FID | 9.71 | NUWA-Infinity |
| 1 Image, 2*2 Stitchi | LHQC | Block-FID | 9.71 | NUWA-Infinity |