Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, Alexei A. Efros
We investigate conditional adversarial networks as a general-purpose solution to image-to-image translation problems. These networks not only learn the mapping from input image to output image, but also learn a loss function to train this mapping. This makes it possible to apply the same generic approach to problems that traditionally would require very different loss formulations. We demonstrate that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks. Indeed, since the release of the pix2pix software associated with this paper, a large number of internet users (many of them artists) have posted their own experiments with our system, further demonstrating its wide applicability and ease of adoption without the need for parameter tweaking. As a community, we no longer hand-engineer our mapping functions, and this work suggests we can achieve reasonable results without hand-engineering our loss functions either.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Image-to-Image Translation | Aerial-to-Map | Class IOU | 0.26 | cGAN |
| Image-to-Image Translation | FLIR | PSNR | 4.19 | pix2pix |
| Image-to-Image Translation | FLIR | SSIM | 0.05 | pix2pix |
| Image-to-Image Translation | Cityscapes Labels-to-Photo | Class IOU | 0.18 | pix2pix |
| Image-to-Image Translation | Cityscapes Labels-to-Photo | Per-class Accuracy | 25 | pix2pix |
| Image-to-Image Translation | Cityscapes Labels-to-Photo | Per-pixel Accuracy | 71 | pix2pix |
| Image-to-Image Translation | Cityscapes Photo-to-Labels | Class IOU | 0.32 | pix2pix |
| Image-to-Image Translation | Dayton (64x64) - ground-to-aerial | SSIM | 0.3675 | Pix2pix |
| Image-to-Image Translation | cvusa | SSIM | 0.3923 | Pix2pix |
| Image-to-Image Translation | Dayton (64×64) - aerial-to-ground | SSIM | 0.4808 | Pix2pix |
| Image-to-Image Translation | Ego2Top | SSIM | 0.2213 | Pix2pix |
| Image-to-Image Translation | Dayton (256×256) - ground-to-aerial | SSIM | 0.2693 | Pix2pix |
| Image-to-Image Translation | Dayton (256×256) - aerial-to-ground | SSIM | 0.418 | Pix2pix |
| Image-to-Image Translation | Fundus Fluorescein Angiogram Photographs & Colour Fundus Images of Diabetic Patients | FID | 48.6 | pix2pix |
| Medical Image Segmentation | Cell17 | Dice | 0.6351 | Pix2Pix |
| Medical Image Segmentation | Cell17 | F1-score | 0.6208 | Pix2Pix |
| Medical Image Segmentation | Cell17 | Hausdorff | 19.1441 | Pix2Pix |
| Image Generation | Aerial-to-Map | Class IOU | 0.26 | cGAN |
| Image Generation | FLIR | PSNR | 4.19 | pix2pix |
| Image Generation | FLIR | SSIM | 0.05 | pix2pix |
| Image Generation | Cityscapes Labels-to-Photo | Class IOU | 0.18 | pix2pix |
| Image Generation | Cityscapes Labels-to-Photo | Per-class Accuracy | 25 | pix2pix |
| Image Generation | Cityscapes Labels-to-Photo | Per-pixel Accuracy | 71 | pix2pix |
| Image Generation | Cityscapes Photo-to-Labels | Class IOU | 0.32 | pix2pix |
| Image Generation | Dayton (64x64) - ground-to-aerial | SSIM | 0.3675 | Pix2pix |
| Image Generation | cvusa | SSIM | 0.3923 | Pix2pix |
| Image Generation | Dayton (64×64) - aerial-to-ground | SSIM | 0.4808 | Pix2pix |
| Image Generation | Ego2Top | SSIM | 0.2213 | Pix2pix |
| Image Generation | Dayton (256×256) - ground-to-aerial | SSIM | 0.2693 | Pix2pix |
| Image Generation | Dayton (256×256) - aerial-to-ground | SSIM | 0.418 | Pix2pix |
| Image Generation | Fundus Fluorescein Angiogram Photographs & Colour Fundus Images of Diabetic Patients | FID | 48.6 | pix2pix |
| Image Reconstruction | Edge-to-Handbags | FID | 96.31 | pix2pix |
| Image Reconstruction | Edge-to-Handbags | LPIPS | 0.234 | pix2pix |
| Image Reconstruction | Edge-to-Shoes | FID | 197.492 | pix2pix |
| Image Reconstruction | Edge-to-Shoes | LPIPS | 0.238 | pix2pix |
| Colorization | ImageNet val | FID-5K | 24.41 | cGAN |
| 1 Image, 2*2 Stitching | Aerial-to-Map | Class IOU | 0.26 | cGAN |
| 1 Image, 2*2 Stitching | FLIR | PSNR | 4.19 | pix2pix |
| 1 Image, 2*2 Stitching | FLIR | SSIM | 0.05 | pix2pix |
| 1 Image, 2*2 Stitching | Cityscapes Labels-to-Photo | Class IOU | 0.18 | pix2pix |
| 1 Image, 2*2 Stitching | Cityscapes Labels-to-Photo | Per-class Accuracy | 25 | pix2pix |
| 1 Image, 2*2 Stitching | Cityscapes Labels-to-Photo | Per-pixel Accuracy | 71 | pix2pix |
| 1 Image, 2*2 Stitching | Cityscapes Photo-to-Labels | Class IOU | 0.32 | pix2pix |
| 1 Image, 2*2 Stitching | Dayton (64x64) - ground-to-aerial | SSIM | 0.3675 | Pix2pix |
| 1 Image, 2*2 Stitching | cvusa | SSIM | 0.3923 | Pix2pix |
| 1 Image, 2*2 Stitching | Dayton (64×64) - aerial-to-ground | SSIM | 0.4808 | Pix2pix |
| 1 Image, 2*2 Stitching | Ego2Top | SSIM | 0.2213 | Pix2pix |
| 1 Image, 2*2 Stitching | Dayton (256×256) - ground-to-aerial | SSIM | 0.2693 | Pix2pix |
| 1 Image, 2*2 Stitching | Dayton (256×256) - aerial-to-ground | SSIM | 0.418 | Pix2pix |
| 1 Image, 2*2 Stitching | Fundus Fluorescein Angiogram Photographs & Colour Fundus Images of Diabetic Patients | FID | 48.6 | pix2pix |