Badour AlBahar, Jia-Bin Huang
We address the problem of guided image-to-image translation where we translate an input image into another while respecting the constraints provided by an external, user-provided guidance image. Various conditioning methods for leveraging the given guidance image have been explored, including input concatenation , feature concatenation, and conditional affine transformation of feature activations. All these conditioning mechanisms, however, are uni-directional, i.e., no information flow from the input image back to the guidance. To better utilize the constraints of the guidance image, we present a bi-directional feature transformation (bFT) scheme. We show that our bFT scheme outperforms other conditioning schemes and has comparable results to state-of-the-art methods on different tasks.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Image Generation | Deep-Fashion | FID | 12.266 | bFT |
| Image Generation | Deep-Fashion | IS | 3.22 | bFT |
| Image Generation | Deep-Fashion | SSIM | 0.767 | bFT |
| Image Reconstruction | Edge-to-Clothes | FID | 58.4 | bFT |
| Image Reconstruction | Edge-to-Clothes | LPIPS | 0.1 | bFT |
| Image Reconstruction | Edge-to-Handbags | FID | 74.9 | bFT |
| Image Reconstruction | Edge-to-Handbags | LPIPS | 0.2 | bFT |
| Image Reconstruction | Edge-to-Shoes | FID | 121.2 | bFT |
| Image Reconstruction | Edge-to-Shoes | LPIPS | 0.1 | bFT |