Yining Zhao, Chao Wen, Zhou Xue, Yue Gao
Significant geometric structures can be compactly described by global wireframes in the estimation of 3D room layout from a single panoramic image. Based on this observation, we present an alternative approach to estimate the walls in 3D space by modeling long-range geometric patterns in a learnable Hough Transform block. We transform the image feature from a cubemap tile to the Hough space of a Manhattan world and directly map the feature to the geometric output. The convolutional layers not only learn the local gradient-like line features, but also utilize the global information to successfully predict occluded walls with a simple network structure. Unlike most previous work, the predictions are performed individually on each cubemap tile, and then assembled to get the layout estimation. Experimental results show that we achieve comparable results with recent state-of-the-art in prediction accuracy and performance. Code is available at https://github.com/Starrah/DMH-Net.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| 3D Reconstruction | Stanford2D3D Panoramic | 3DIoU | 84.93 | DMH-Net |
| 3D Reconstruction | Stanford2D3D Panoramic | Corner Error | 0.67 | DMH-Net |
| 3D Reconstruction | Stanford2D3D Panoramic | Pixel Error | 1.93 | DMH-Net |
| 3D Reconstruction | PanoContext | 3DIoU | 85.48 | DMH-Net |
| Scene Parsing | Stanford2D3D Panoramic | 3DIoU | 84.93 | DMH-Net |
| Scene Parsing | Stanford2D3D Panoramic | Corner Error | 0.67 | DMH-Net |
| Scene Parsing | Stanford2D3D Panoramic | Pixel Error | 1.93 | DMH-Net |
| Scene Parsing | PanoContext | 3DIoU | 85.48 | DMH-Net |
| 3D | Stanford2D3D Panoramic | 3DIoU | 84.93 | DMH-Net |
| 3D | Stanford2D3D Panoramic | Corner Error | 0.67 | DMH-Net |
| 3D | Stanford2D3D Panoramic | Pixel Error | 1.93 | DMH-Net |
| 3D | PanoContext | 3DIoU | 85.48 | DMH-Net |
| Scene Understanding | Stanford2D3D Panoramic | 3DIoU | 84.93 | DMH-Net |
| Scene Understanding | Stanford2D3D Panoramic | Corner Error | 0.67 | DMH-Net |
| Scene Understanding | Stanford2D3D Panoramic | Pixel Error | 1.93 | DMH-Net |
| Scene Understanding | PanoContext | 3DIoU | 85.48 | DMH-Net |
| 2D Semantic Segmentation | Stanford2D3D Panoramic | 3DIoU | 84.93 | DMH-Net |
| 2D Semantic Segmentation | Stanford2D3D Panoramic | Corner Error | 0.67 | DMH-Net |
| 2D Semantic Segmentation | Stanford2D3D Panoramic | Pixel Error | 1.93 | DMH-Net |
| 2D Semantic Segmentation | PanoContext | 3DIoU | 85.48 | DMH-Net |