Cheng Sun, Chi-Wei Hsiao, Min Sun, Hwann-Tzong Chen
We present a new approach to the problem of estimating the 3D room layout from a single panoramic image. We represent room layout as three 1D vectors that encode, at each image column, the boundary positions of floor-wall and ceiling-wall, and the existence of wall-wall boundary. The proposed network, HorizonNet, trained for predicting 1D layout, outperforms previous state-of-the-art approaches. The designed post-processing procedure for recovering 3D room layouts from 1D predictions can automatically infer the room shape with low computation cost - it takes less than 20ms for a panorama image while prior works might need dozens of seconds. We also propose Pano Stretch Data Augmentation, which can diversify panorama data and be applied to other panorama-related learning tasks. Due to the limited data available for non-cuboid layout, we relabel 65 general layout from the current dataset for finetuning. Our approach shows good performance on general layouts by qualitative results and cross-validation.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| 3D Reconstruction | Stanford2D3D Panoramic | 3DIoU | 79.79 | HorizonNet |
| 3D Reconstruction | Stanford2D3D Panoramic | Corner Error | 0.71 | HorizonNet |
| 3D Reconstruction | Stanford2D3D Panoramic | Pixel Error | 2.39 | HorizonNet |
| 3D Reconstruction | PanoContext | 3DIoU | 82.17 | HorizonNet |
| Scene Parsing | Stanford2D3D Panoramic | 3DIoU | 79.79 | HorizonNet |
| Scene Parsing | Stanford2D3D Panoramic | Corner Error | 0.71 | HorizonNet |
| Scene Parsing | Stanford2D3D Panoramic | Pixel Error | 2.39 | HorizonNet |
| Scene Parsing | PanoContext | 3DIoU | 82.17 | HorizonNet |
| 3D | Stanford2D3D Panoramic | 3DIoU | 79.79 | HorizonNet |
| 3D | Stanford2D3D Panoramic | Corner Error | 0.71 | HorizonNet |
| 3D | Stanford2D3D Panoramic | Pixel Error | 2.39 | HorizonNet |
| 3D | PanoContext | 3DIoU | 82.17 | HorizonNet |
| Scene Understanding | Stanford2D3D Panoramic | 3DIoU | 79.79 | HorizonNet |
| Scene Understanding | Stanford2D3D Panoramic | Corner Error | 0.71 | HorizonNet |
| Scene Understanding | Stanford2D3D Panoramic | Pixel Error | 2.39 | HorizonNet |
| Scene Understanding | PanoContext | 3DIoU | 82.17 | HorizonNet |
| 2D Semantic Segmentation | Stanford2D3D Panoramic | 3DIoU | 79.79 | HorizonNet |
| 2D Semantic Segmentation | Stanford2D3D Panoramic | Corner Error | 0.71 | HorizonNet |
| 2D Semantic Segmentation | Stanford2D3D Panoramic | Pixel Error | 2.39 | HorizonNet |
| 2D Semantic Segmentation | PanoContext | 3DIoU | 82.17 | HorizonNet |