PaGoDA: Progressive Growing of a One-Step Generator from a Low-Resolution Diffusion Teacher

Dongjun Kim, Chieh-Hsin Lai, Wei-Hsiang Liao, Yuhta Takida, Naoki Murata, Toshimitsu Uesaka, Yuki Mitsufuji, Stefano Ermon

2024-05-23Super-Resolution Image Generation

Paper PDF Code(official)

Abstract

The diffusion model performs remarkable in generating high-dimensional content but is computationally intensive, especially during training. We propose Progressive Growing of Diffusion Autoencoder (PaGoDA), a novel pipeline that reduces the training costs through three stages: training diffusion on downsampled data, distilling the pretrained diffusion, and progressive super-resolution. With the proposed pipeline, PaGoDA achieves a $64\times$ reduced cost in training its diffusion model on 8x downsampled data; while at the inference, with the single-step, it performs state-of-the-art on ImageNet across all resolutions from 64x64 to 512x512, and text-to-image. PaGoDA's pipeline can be applied directly in the latent space, adding compression alongside the pre-trained autoencoder in Latent Diffusion Models (e.g., Stable Diffusion). The code is available at https://github.com/sony/pagoda.

Results

Task	Dataset	Metric	Value	Model
Image Generation	ImageNet 64x64	FID	1.21	PaGoDA
Image Generation	ImageNet 64x64	Inception Score	76.47	PaGoDA
Image Generation	ImageNet 64x64	NFE	1	PaGoDA
Image Generation	ImageNet 32x32	FID	0.79	PaGoDA
Image Generation	ImageNet 128x128	FID	1.48	PaGoDA
Image Generation	ImageNet 512x512	FID	1.8	PaGoDA
Image Generation	ImageNet 256x256	FID	1.56	PaGoDA

Related Papers

SpectraLift: Physics-Guided Spectral-Inversion Network for Self-Supervised Hyperspectral Image Super-Resolution2025-07-17 fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17 Synthesizing Reality: Leveraging the Generative AI-Powered Platform Midjourney for Construction Worker Detection2025-07-17 FashionPose: Text to Pose to Relight Image Generation for Personalized Fashion Visualization2025-07-17 A Distributed Generative AI Approach for Heterogeneous Multi-Domain Environments under Data Sharing constraints2025-07-17 Pixel Perfect MegaMed: A Megapixel-Scale Vision-Language Foundation Model for Generating High Resolution Medical Images2025-07-17 FADE: Adversarial Concept Erasure in Flow Models2025-07-16 CharaConsist: Fine-Grained Consistent Character Generation2025-07-15