Hao Tang, Dan Xu, Yan Yan, Philip H. S. Torr, Nicu Sebe
In this paper, we address the task of semantic-guided scene generation. One open challenge in scene generation is the difficulty of the generation of small objects and detailed local texture, which has been widely observed in global image-level generation methods. To tackle this issue, in this work we consider learning the scene generation in a local context, and correspondingly design a local class-specific generative network with semantic maps as a guidance, which separately constructs and learns sub-generators concentrating on the generation of different classes, and is able to provide more scene details. To learn more discriminative class-specific feature representations for the local generation, a novel classification module is also proposed. To combine the advantage of both the global image-level and the local class-specific generation, a joint generation network is designed with an attention fusion module and a dual-discriminator structure embedded. Extensive experiments on two scene image generation tasks show superior generation performance of the proposed model. The state-of-the-art results are established by large margins on both tasks and on challenging public benchmarks. The source code and trained models are available at https://github.com/Ha0Tang/LGGAN.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Image-to-Image Translation | cvusa | KL | 2.55 | LGGAN |
| Image-to-Image Translation | cvusa | PSNR | 22.5766 | LGGAN |
| Image-to-Image Translation | cvusa | SD | 19.744 | LGGAN |
| Image-to-Image Translation | cvusa | SSIM | 0.5238 | LGGAN |
| Image-to-Image Translation | Dayton (256×256) - aerial-to-ground | KL | 2.18 | LGGAN |
| Image-to-Image Translation | Dayton (256×256) - aerial-to-ground | PSNR | 22.9949 | LGGAN |
| Image-to-Image Translation | Dayton (256×256) - aerial-to-ground | SD | 19.6145 | LGGAN |
| Image-to-Image Translation | Dayton (256×256) - aerial-to-ground | SSIM | 0.5457 | LGGAN |
| Image Generation | cvusa | KL | 2.55 | LGGAN |
| Image Generation | cvusa | PSNR | 22.5766 | LGGAN |
| Image Generation | cvusa | SD | 19.744 | LGGAN |
| Image Generation | cvusa | SSIM | 0.5238 | LGGAN |
| Image Generation | Dayton (256×256) - aerial-to-ground | KL | 2.18 | LGGAN |
| Image Generation | Dayton (256×256) - aerial-to-ground | PSNR | 22.9949 | LGGAN |
| Image Generation | Dayton (256×256) - aerial-to-ground | SD | 19.6145 | LGGAN |
| Image Generation | Dayton (256×256) - aerial-to-ground | SSIM | 0.5457 | LGGAN |
| 1 Image, 2*2 Stitching | cvusa | KL | 2.55 | LGGAN |
| 1 Image, 2*2 Stitching | cvusa | PSNR | 22.5766 | LGGAN |
| 1 Image, 2*2 Stitching | cvusa | SD | 19.744 | LGGAN |
| 1 Image, 2*2 Stitching | cvusa | SSIM | 0.5238 | LGGAN |
| 1 Image, 2*2 Stitching | Dayton (256×256) - aerial-to-ground | KL | 2.18 | LGGAN |
| 1 Image, 2*2 Stitching | Dayton (256×256) - aerial-to-ground | PSNR | 22.9949 | LGGAN |
| 1 Image, 2*2 Stitching | Dayton (256×256) - aerial-to-ground | SD | 19.6145 | LGGAN |
| 1 Image, 2*2 Stitching | Dayton (256×256) - aerial-to-ground | SSIM | 0.5457 | LGGAN |