George Eskandar, Mohamed Abdelsamad, Karim Armanious, Bin Yang
Semantic Image Synthesis (SIS) is a subclass of image-to-image translation where a photorealistic image is synthesized from a segmentation mask. SIS has mostly been addressed as a supervised problem. However, state-of-the-art methods depend on a huge amount of labeled data and cannot be applied in an unpaired setting. On the other hand, generic unpaired image-to-image translation frameworks underperform in comparison, because they color-code semantic layouts and feed them to traditional convolutional networks, which then learn correspondences in appearance instead of semantic content. In this initial work, we propose a new Unsupervised paradigm for Semantic Image Synthesis (USIS) as a first step towards closing the performance gap between paired and unpaired settings. Notably, the framework deploys a SPADE generator that learns to output images with visually separable semantic classes using a self-supervised segmentation loss. Furthermore, in order to match the color and texture distribution of real images without losing high-frequency information, we propose to use whole image wavelet-based discrimination. We test our methodology on 3 challenging datasets and demonstrate its ability to generate multimodal photorealistic images with an improved quality in the unpaired setting.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Image-to-Image Translation | COCO-Stuff Labels-to-Photos | FID | 27.8 | USIS |
| Image-to-Image Translation | COCO-Stuff Labels-to-Photos | mIoU | 14.06 | USIS |
| Image-to-Image Translation | Cityscapes Labels-to-Photo | FID | 53.67 | USIS |
| Image-to-Image Translation | Cityscapes Labels-to-Photo | mIoU | 44.78 | USIS |
| Image-to-Image Translation | ADE20K Labels-to-Photos | FID | 33.2 | USIS |
| Image-to-Image Translation | ADE20K Labels-to-Photos | mIoU | 17.38 | USIS |
| Image Generation | COCO-Stuff Labels-to-Photos | FID | 27.8 | USIS |
| Image Generation | COCO-Stuff Labels-to-Photos | mIoU | 14.06 | USIS |
| Image Generation | Cityscapes Labels-to-Photo | FID | 53.67 | USIS |
| Image Generation | Cityscapes Labels-to-Photo | mIoU | 44.78 | USIS |
| Image Generation | ADE20K Labels-to-Photos | FID | 33.2 | USIS |
| Image Generation | ADE20K Labels-to-Photos | mIoU | 17.38 | USIS |
| 1 Image, 2*2 Stitching | COCO-Stuff Labels-to-Photos | FID | 27.8 | USIS |
| 1 Image, 2*2 Stitching | COCO-Stuff Labels-to-Photos | mIoU | 14.06 | USIS |
| 1 Image, 2*2 Stitching | Cityscapes Labels-to-Photo | FID | 53.67 | USIS |
| 1 Image, 2*2 Stitching | Cityscapes Labels-to-Photo | mIoU | 44.78 | USIS |
| 1 Image, 2*2 Stitching | ADE20K Labels-to-Photos | FID | 33.2 | USIS |
| 1 Image, 2*2 Stitching | ADE20K Labels-to-Photos | mIoU | 17.38 | USIS |