Chitwan Saharia, Jonathan Ho, William Chan, Tim Salimans, David J. Fleet, Mohammad Norouzi
We present SR3, an approach to image Super-Resolution via Repeated Refinement. SR3 adapts denoising diffusion probabilistic models to conditional image generation and performs super-resolution through a stochastic denoising process. Inference starts with pure Gaussian noise and iteratively refines the noisy output using a U-Net model trained on denoising at various noise levels. SR3 exhibits strong performance on super-resolution tasks at different magnification factors, on faces and natural images. We conduct human evaluation on a standard 8X face super-resolution task on CelebA-HQ, comparing with SOTA GAN methods. SR3 achieves a fool rate close to 50%, suggesting photo-realistic outputs, while GANs do not exceed a fool rate of 34%. We further show the effectiveness of SR3 in cascaded image generation, where generative models are chained with super-resolution models, yielding a competitive FID score of 11.3 on ImageNet.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Super-Resolution | CelebA-HQ 128x128 | Consistency | 2.68 | SR3 |
| Super-Resolution | CelebA-HQ 128x128 | PSNR | 23.04 | SR3 |
| Super-Resolution | CelebA-HQ 128x128 | SSIM | 0.65 | SR3 |
| Image Super-Resolution | CelebA-HQ 128x128 | Consistency | 2.68 | SR3 |
| Image Super-Resolution | CelebA-HQ 128x128 | PSNR | 23.04 | SR3 |
| Image Super-Resolution | CelebA-HQ 128x128 | SSIM | 0.65 | SR3 |
| 3D Object Super-Resolution | CelebA-HQ 128x128 | Consistency | 2.68 | SR3 |
| 3D Object Super-Resolution | CelebA-HQ 128x128 | PSNR | 23.04 | SR3 |
| 3D Object Super-Resolution | CelebA-HQ 128x128 | SSIM | 0.65 | SR3 |
| 16k | CelebA-HQ 128x128 | Consistency | 2.68 | SR3 |
| 16k | CelebA-HQ 128x128 | PSNR | 23.04 | SR3 |
| 16k | CelebA-HQ 128x128 | SSIM | 0.65 | SR3 |