Shunsuke Saito, Zeng Huang, Ryota Natsume, Shigeo Morishima, Angjoo Kanazawa, Hao Li
We introduce Pixel-aligned Implicit Function (PIFu), a highly effective implicit representation that locally aligns pixels of 2D images with the global context of their corresponding 3D object. Using PIFu, we propose an end-to-end deep learning method for digitizing highly detailed clothed humans that can infer both 3D surface and texture from a single image, and optionally, multiple input images. Highly intricate shapes, such as hairstyles, clothing, as well as their variations and deformations can be digitized in a unified way. Compared to existing representations used for 3D deep learning, PIFu can produce high-resolution surfaces including largely unseen regions such as the back of a person. In particular, it is memory efficient unlike the voxel representation, can handle arbitrary topology, and the resulting surface is spatially aligned with the input image. Furthermore, while previous techniques are designed to process either a single image or multiple views, PIFu extends naturally to arbitrary number of views. We demonstrate high-resolution and robust reconstructions on real world images from the DeepFashion dataset, which contains a variety of challenging clothing types. Our method achieves state-of-the-art performance on a public benchmark and outperforms the prior work for clothed human digitization from a single image.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Reconstruction | CustomHumans | Chamfer Distance P-to-S | 2.209 | PIFu |
| Reconstruction | CustomHumans | Chamfer Distance S-to-P | 2.582 | PIFu |
| Reconstruction | CustomHumans | Normal Consistency | 0.805 | PIFu |
| Reconstruction | CustomHumans | f-Score | 34.881 | PIFu |
| Reconstruction | CAPE | Chamfer (cm) | 3.573 | PIFu (THuman2.0) |
| Reconstruction | CAPE | NC | 0.186 | PIFu (THuman2.0) |
| Reconstruction | CAPE | P2S (cm) | 1.483 | PIFu (THuman2.0) |
| Reconstruction | 4D-DRESS | Chamfer (cm) | 2.696 | PIFu_Inner |
| Reconstruction | 4D-DRESS | IoU | 0.69 | PIFu_Inner |
| Reconstruction | 4D-DRESS | Normal Consistency | 0.792 | PIFu_Inner |
| Reconstruction | 4D-DRESS | Chamfer (cm) | 2.783 | PIFu_Outer |
| Reconstruction | 4D-DRESS | IoU | 0.697 | PIFu_Outer |
| Reconstruction | 4D-DRESS | Normal Consistency | 0.759 | PIFu_Outer |
| Object Reconstruction | RenderPeople | Chamfer (cm) | 0.567 | PIFu (3 views) |
| Object Reconstruction | RenderPeople | Point-to-surface distance (cm) | 0.554 | PIFu (3 views) |
| Object Reconstruction | RenderPeople | Surface normal consistency | 0.094 | PIFu (3 views) |
| Object Reconstruction | RenderPeople | Chamfer (cm) | 1.5 | PIFu |
| Object Reconstruction | RenderPeople | Point-to-surface distance (cm) | 1.52 | PIFu |
| Object Reconstruction | RenderPeople | Surface normal consistency | 0.084 | PIFu |
| Object Reconstruction | BUFF | Chamfer (cm) | 1.14 | PIFu |
| Object Reconstruction | BUFF | Point-to-surface distance (cm) | 1.15 | PIFu |
| Object Reconstruction | BUFF | Surface normal consistency | 0.0928 | PIFu |
| 3D Object Reconstruction | RenderPeople | Chamfer (cm) | 0.567 | PIFu (3 views) |
| 3D Object Reconstruction | RenderPeople | Point-to-surface distance (cm) | 0.554 | PIFu (3 views) |
| 3D Object Reconstruction | RenderPeople | Surface normal consistency | 0.094 | PIFu (3 views) |
| 3D Object Reconstruction | RenderPeople | Chamfer (cm) | 1.5 | PIFu |
| 3D Object Reconstruction | RenderPeople | Point-to-surface distance (cm) | 1.52 | PIFu |
| 3D Object Reconstruction | RenderPeople | Surface normal consistency | 0.084 | PIFu |
| 3D Object Reconstruction | BUFF | Chamfer (cm) | 1.14 | PIFu |
| 3D Object Reconstruction | BUFF | Point-to-surface distance (cm) | 1.15 | PIFu |
| 3D Object Reconstruction | BUFF | Surface normal consistency | 0.0928 | PIFu |
| Lifelike 3D Human Generation | THuman2.0 Dataset | CLIP Similarity | 0.8501 | PIFu |
| Lifelike 3D Human Generation | THuman2.0 Dataset | LPIPS | 0.1615 | PIFu |
| Lifelike 3D Human Generation | THuman2.0 Dataset | PSNR | 15.0248 | PIFu |
| Lifelike 3D Human Generation | THuman2.0 Dataset | SSIM | 0.8884 | PIFu |