TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Holistic 3D Scene Parsing and Reconstruction from a Single...

Holistic 3D Scene Parsing and Reconstruction from a Single RGB Image

Siyuan Huang, Siyuan Qi, Yixin Zhu, Yinxue Xiao, Yuanlu Xu, Song-Chun Zhu

2018-08-07ECCV 2018 9Scene ParsingMonocular 3D Object DetectionRoom Layout EstimationScene UnderstandingSemantic SegmentationObject Localizationobject-detection3D Object DetectionObject Detection
PaperPDFCode

Abstract

We propose a computational framework to jointly parse a single RGB image and reconstruct a holistic 3D configuration composed by a set of CAD models using a stochastic grammar model. Specifically, we introduce a Holistic Scene Grammar (HSG) to represent the 3D scene structure, which characterizes a joint distribution over the functional and geometric space of indoor scenes. The proposed HSG captures three essential and often latent dimensions of the indoor scenes: i) latent human context, describing the affordance and the functionality of a room arrangement, ii) geometric constraints over the scene configurations, and iii) physical constraints that guarantee physically plausible parsing and reconstruction. We solve this joint parsing and reconstruction problem in an analysis-by-synthesis fashion, seeking to minimize the differences between the input image and the rendered images generated by our 3D representation, over the space of depth, surface normal, and object segmentation map. The optimal configuration, represented by a parse graph, is inferred using Markov chain Monte Carlo (MCMC), which efficiently traverses through the non-differentiable solution space, jointly optimizing object localization, 3D layout, and hidden human context. Experimental results demonstrate that the proposed algorithm improves the generalization ability and significantly outperforms prior methods on 3D layout estimation, 3D object detection, and holistic scene understanding.

Results

TaskDatasetMetricValueModel
Object DetectionSUN RGB-DAP@0.15 (10 / NYU-37)14.01Holistic
Object DetectionSUN RGB-DAP@0.15 (10 / PNet-30)14.01Holistic
3DSUN RGB-DAP@0.15 (10 / NYU-37)14.01Holistic
3DSUN RGB-DAP@0.15 (10 / PNet-30)14.01Holistic
3D Object DetectionSUN RGB-DAP@0.15 (10 / NYU-37)14.01Holistic
3D Object DetectionSUN RGB-DAP@0.15 (10 / PNet-30)14.01Holistic
2D ClassificationSUN RGB-DAP@0.15 (10 / NYU-37)14.01Holistic
2D ClassificationSUN RGB-DAP@0.15 (10 / PNet-30)14.01Holistic
2D Object DetectionSUN RGB-DAP@0.15 (10 / NYU-37)14.01Holistic
2D Object DetectionSUN RGB-DAP@0.15 (10 / PNet-30)14.01Holistic
16kSUN RGB-DAP@0.15 (10 / NYU-37)14.01Holistic
16kSUN RGB-DAP@0.15 (10 / PNet-30)14.01Holistic
Room Layout EstimationSUN RGB-DCamera Pitch7.6Holistic
Room Layout EstimationSUN RGB-DCamera Roll3.12Holistic
Room Layout EstimationSUN RGB-DIoU54.9Holistic

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Advancing Complex Wide-Area Scene Understanding with Hierarchical Coresets Selection2025-07-17Argus: Leveraging Multiview Images for Improved 3-D Scene Understanding With Large Language Models2025-07-17City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17