George Eskandar, Mohamed Abdelsamad, Karim Armanious, Shuai Zhang, Bin Yang
Semantic Image Synthesis (SIS) is a subclass of image-to-image translation where a semantic layout is used to generate a photorealistic image. State-of-the-art conditional Generative Adversarial Networks (GANs) need a huge amount of paired data to accomplish this task while generic unpaired image-to-image translation frameworks underperform in comparison, because they color-code semantic layouts and learn correspondences in appearance instead of semantic content. Starting from the assumption that a high quality generated image should be segmented back to its semantic layout, we propose a new Unsupervised paradigm for SIS (USIS) that makes use of a self-supervised segmentation loss and whole image wavelet based discrimination. Furthermore, in order to match the high-frequency distribution of real images, a novel generator architecture in the wavelet domain is proposed. We test our methodology on 3 challenging datasets and demonstrate its ability to bridge the performance gap between paired and unpaired models.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Image-to-Image Translation | COCO-Stuff Labels-to-Photos | FID | 28.6 | USIS-Wavelet |
| Image-to-Image Translation | COCO-Stuff Labels-to-Photos | mIoU | 13.4 | USIS-Wavelet |
| Image-to-Image Translation | Cityscapes Labels-to-Photo | FID | 50.14 | USIS-Wavelet |
| Image-to-Image Translation | Cityscapes Labels-to-Photo | mIoU | 42.32 | USIS-Wavelet |
| Image-to-Image Translation | ADE20K Labels-to-Photos | FID | 34.5 | USIS-Wavelet |
| Image-to-Image Translation | ADE20K Labels-to-Photos | mIoU | 16.95 | USIS-Wavelet |
| Image Generation | COCO-Stuff Labels-to-Photos | FID | 28.6 | USIS-Wavelet |
| Image Generation | COCO-Stuff Labels-to-Photos | mIoU | 13.4 | USIS-Wavelet |
| Image Generation | Cityscapes Labels-to-Photo | FID | 50.14 | USIS-Wavelet |
| Image Generation | Cityscapes Labels-to-Photo | mIoU | 42.32 | USIS-Wavelet |
| Image Generation | ADE20K Labels-to-Photos | FID | 34.5 | USIS-Wavelet |
| Image Generation | ADE20K Labels-to-Photos | mIoU | 16.95 | USIS-Wavelet |
| 1 Image, 2*2 Stitching | COCO-Stuff Labels-to-Photos | FID | 28.6 | USIS-Wavelet |
| 1 Image, 2*2 Stitching | COCO-Stuff Labels-to-Photos | mIoU | 13.4 | USIS-Wavelet |
| 1 Image, 2*2 Stitching | Cityscapes Labels-to-Photo | FID | 50.14 | USIS-Wavelet |
| 1 Image, 2*2 Stitching | Cityscapes Labels-to-Photo | mIoU | 42.32 | USIS-Wavelet |
| 1 Image, 2*2 Stitching | ADE20K Labels-to-Photos | FID | 34.5 | USIS-Wavelet |
| 1 Image, 2*2 Stitching | ADE20K Labels-to-Photos | mIoU | 16.95 | USIS-Wavelet |