Xingyuan Sun, Jiajun Wu, Xiuming Zhang, Zhoutong Zhang, Chengkai Zhang, Tianfan Xue, Joshua B. Tenenbaum, William T. Freeman
We study 3D shape modeling from a single image and make contributions to it in three aspects. First, we present Pix3D, a large-scale benchmark of diverse image-shape pairs with pixel-level 2D-3D alignment. Pix3D has wide applications in shape-related tasks including reconstruction, retrieval, viewpoint estimation, etc. Building such a large-scale dataset, however, is highly challenging; existing datasets either contain only synthetic data, or lack precise alignment between 2D images and 3D shapes, or only have a small number of images. Second, we calibrate the evaluation criteria for 3D shape reconstruction through behavioral studies, and use them to objectively and systematically benchmark cutting-edge reconstruction algorithms on Pix3D. Third, we design a novel model that simultaneously performs 3D reconstruction and pose estimation; our multi-task learning approach achieves state-of-the-art performance on both tasks.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| 3D | Pix3D | CD | 0.119 | MarrNet extension (w/ Pose) |
| 3D | Pix3D | EMD | 0.118 | MarrNet extension (w/ Pose) |
| 3D | Pix3D | IoU | 0.282 | MarrNet extension (w/ Pose) |
| 3D | Pix3D | R@1 | 0.53 | MarrNet extension (w/o Pose) |
| 3D | Pix3D | R@16 | 0.85 | MarrNet extension (w/o Pose) |
| 3D | Pix3D | R@2 | 0.62 | MarrNet extension (w/o Pose) |
| 3D | Pix3D | R@32 | 0.9 | MarrNet extension (w/o Pose) |
| 3D | Pix3D | R@4 | 0.71 | MarrNet extension (w/o Pose) |
| 3D | Pix3D | R@8 | 0.78 | MarrNet extension (w/o Pose) |
| 3D Shape Reconstruction | Pix3D | CD | 0.119 | MarrNet extension (w/ Pose) |
| 3D Shape Reconstruction | Pix3D | EMD | 0.118 | MarrNet extension (w/ Pose) |
| 3D Shape Reconstruction | Pix3D | IoU | 0.282 | MarrNet extension (w/ Pose) |