Zhentao Tan, Menglei Chai, Dongdong Chen, Jing Liao, Qi Chu, Bin Liu, Gang Hua, Nenghai Yu
Semantic image synthesis, translating semantic layouts to photo-realistic images, is a one-to-many mapping problem. Though impressive progress has been recently made, diverse semantic synthesis that can efficiently produce semantic-level multimodal results, still remains a challenge. In this paper, we propose a novel diverse semantic image synthesis framework from the perspective of semantic class distributions, which naturally supports diverse generation at semantic or even instance level. We achieve this by modeling class-level conditional modulation parameters as continuous probability distributions instead of discrete values, and sampling per-instance modulation parameters through instance-adaptive stochastic sampling that is consistent across the network. Moreover, we propose prior noise remapping, through linear perturbation parameters encoded from paired references, to facilitate supervised training and exemplar-based instance style control at test time. Extensive experiments on multiple datasets show that our method can achieve superior diversity and comparable quality compared to state-of-the-art methods. Code will be available at \url{https://github.com/tzt101/INADE.git}
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Image-to-Image Translation | Deep-Fashion | FID | 9.97 | INADE |
| Image-to-Image Translation | Cityscapes Labels-to-Photo | LPIPS | 0.248 | INADE |
| Image-to-Image Translation | ADE20K Labels-to-Photos | LPIPS | 0.4 | INADE |
| Image Generation | Deep-Fashion | FID | 9.97 | INADE |
| Image Generation | Cityscapes Labels-to-Photo | LPIPS | 0.248 | INADE |
| Image Generation | ADE20K Labels-to-Photos | LPIPS | 0.4 | INADE |
| 1 Image, 2*2 Stitching | Deep-Fashion | FID | 9.97 | INADE |
| 1 Image, 2*2 Stitching | Cityscapes Labels-to-Photo | LPIPS | 0.248 | INADE |
| 1 Image, 2*2 Stitching | ADE20K Labels-to-Photos | LPIPS | 0.4 | INADE |