Liqian Ma, Xu Jia, Qianru Sun, Bernt Schiele, Tinne Tuytelaars, Luc van Gool
This paper proposes the novel Pose Guided Person Generation Network (PG$^2$) that allows to synthesize person images in arbitrary poses, based on an image of that person and a novel pose. Our generation framework PG$^2$ utilizes the pose information explicitly and consists of two key stages: pose integration and image refinement. In the first stage the condition image and the target pose are fed into a U-Net-like network to generate an initial but coarse image of the person with the target pose. The second stage then refines the initial and blurry result by training a U-Net-like generator in an adversarial way. Extensive experimental results on both 128$\times$64 re-identification images and 256$\times$256 fashion photos show that our model generates high-quality person images with convincing details.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Image Generation | Deep-Fashion | IS | 3.09 | PG Squared |
| Image Generation | Deep-Fashion | SSIM | 0.762 | PG Squared |
| Hand | NTU Hand Digit | AMT | 3.5 | PG2 |
| Hand | NTU Hand Digit | IS | 2.4152 | PG2 |
| Hand | NTU Hand Digit | PSNR | 28.2403 | PG2 |
| Hand | Senz3D | AMT | 2.8 | PG2 |
| Hand | Senz3D | IS | 3.3699 | PG2 |
| Hand | Senz3D | PSNR | 26.5138 | PG2 |