Wenyan Cong, Xinhao Tao, Li Niu, Jing Liang, Xuesong Gao, Qihao Sun, Liqing Zhang
Given a composite image, image harmonization aims to adjust the foreground to make it compatible with the background. High-resolution image harmonization is in high demand, but still remains unexplored. Conventional image harmonization methods learn global RGB-to-RGB transformation which could effortlessly scale to high resolution, but ignore diverse local context. Recent deep learning methods learn the dense pixel-to-pixel transformation which could generate harmonious outputs, but are highly constrained in low resolution. In this work, we propose a high-resolution image harmonization network with Collaborative Dual Transformation (CDTNet) to combine pixel-to-pixel transformation and RGB-to-RGB transformation coherently in an end-to-end network. Our CDTNet consists of a low-resolution generator for pixel-to-pixel transformation, a color mapping module for RGB-to-RGB transformation, and a refinement module to take advantage of both. Extensive experiments on high-resolution benchmark dataset and our created high-resolution real composite images demonstrate that our CDTNet strikes a good balance between efficiency and effectiveness. Our used datasets can be found in https://github.com/bcmi/CDTNet-High-Resolution-Image-Harmonization.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Image Generation | iHarmony4 | MSE | 23.75 | CDTNet |
| Image Generation | iHarmony4 | PSNR | 38.23 | CDTNet |
| Image Generation | HAdobe5k(1024$\times$1024) | MSE | 21.24 | CDTNet |
| Image Generation | HAdobe5k(1024$\times$1024) | PSNR | 38.77 | CDTNet |
| Image Generation | HAdobe5k(1024$\times$1024) | SSIM | 0.9868 | CDTNet |
| Image Generation | HAdobe5k(1024$\times$1024) | fMSE | 152.13 | CDTNet |