TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Total3DUnderstanding: Joint Layout, Object Pose and Mesh R...

Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes from a Single Image

Yinyu Nie, Xiaoguang Han, Shihui Guo, Yujian Zheng, Jian Chang, Jian Jun Zhang

2020-02-27CVPR 2020 63D Shape ReconstructionObject ReconstructionMonocular 3D Object DetectionRoom Layout EstimationScene Understandingobject-detection3D Object DetectionObject Detection
PaperPDFCode(official)

Abstract

Semantic reconstruction of indoor scenes refers to both scene understanding and object reconstruction. Existing works either address one part of this problem or focus on independent objects. In this paper, we bridge the gap between understanding and reconstruction, and propose an end-to-end solution to jointly reconstruct room layout, object bounding boxes and meshes from a single image. Instead of separately resolving scene understanding and object reconstruction, our method builds upon a holistic scene context and proposes a coarse-to-fine hierarchy with three components: 1. room layout with camera pose; 2. 3D object bounding boxes; 3. object meshes. We argue that understanding the context of each component can assist the task of parsing the others, which enables joint understanding and reconstruction. The experiments on the SUN RGB-D and Pix3D datasets demonstrate that our method consistently outperforms existing methods in indoor layout estimation, 3D object detection and mesh reconstruction.

Results

TaskDatasetMetricValueModel
Object DetectionSUN RGB-DAP@0.15 (10 / NYU-37)26.38Total3D joint
Object DetectionSUN RGB-DAP@0.15 (NYU-37)14.28Total3D joint
Object DetectionSUN RGB-DAP@0.15 (10 / NYU-37)23.32Total3D w/o. joint
Object DetectionSUN RGB-DAP@0.15 (NYU-37)13.25Total3D w/o. joint
3DSUN RGB-DAP@0.15 (10 / NYU-37)26.38Total3D joint
3DSUN RGB-DAP@0.15 (NYU-37)14.28Total3D joint
3DSUN RGB-DAP@0.15 (10 / NYU-37)23.32Total3D w/o. joint
3DSUN RGB-DAP@0.15 (NYU-37)13.25Total3D w/o. joint
3DPix3DCD0.0836MGN
3D Object DetectionSUN RGB-DAP@0.15 (10 / NYU-37)26.38Total3D joint
3D Object DetectionSUN RGB-DAP@0.15 (NYU-37)14.28Total3D joint
3D Object DetectionSUN RGB-DAP@0.15 (10 / NYU-37)23.32Total3D w/o. joint
3D Object DetectionSUN RGB-DAP@0.15 (NYU-37)13.25Total3D w/o. joint
3D Shape ReconstructionPix3DCD0.0836MGN
2D ClassificationSUN RGB-DAP@0.15 (10 / NYU-37)26.38Total3D joint
2D ClassificationSUN RGB-DAP@0.15 (NYU-37)14.28Total3D joint
2D ClassificationSUN RGB-DAP@0.15 (10 / NYU-37)23.32Total3D w/o. joint
2D ClassificationSUN RGB-DAP@0.15 (NYU-37)13.25Total3D w/o. joint
2D Object DetectionSUN RGB-DAP@0.15 (10 / NYU-37)26.38Total3D joint
2D Object DetectionSUN RGB-DAP@0.15 (NYU-37)14.28Total3D joint
2D Object DetectionSUN RGB-DAP@0.15 (10 / NYU-37)23.32Total3D w/o. joint
2D Object DetectionSUN RGB-DAP@0.15 (NYU-37)13.25Total3D w/o. joint
16kSUN RGB-DAP@0.15 (10 / NYU-37)26.38Total3D joint
16kSUN RGB-DAP@0.15 (NYU-37)14.28Total3D joint
16kSUN RGB-DAP@0.15 (10 / NYU-37)23.32Total3D w/o. joint
16kSUN RGB-DAP@0.15 (NYU-37)13.25Total3D w/o. joint
Room Layout EstimationSUN RGB-DCamera Pitch3.15Total3D joint
Room Layout EstimationSUN RGB-DCamera Roll2.09Total3D joint
Room Layout EstimationSUN RGB-DIoU59.2Total3D joint
Room Layout EstimationSUN RGB-DCamera Pitch3.68Total w/o. joint
Room Layout EstimationSUN RGB-DCamera Roll2.59Total w/o. joint
Room Layout EstimationSUN RGB-DIoU57.6Total w/o. joint

Related Papers

Advancing Complex Wide-Area Scene Understanding with Hierarchical Coresets Selection2025-07-17Argus: Leveraging Multiview Images for Improved 3-D Scene Understanding With Large Language Models2025-07-17City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images2025-07-17Decoupled PROB: Decoupled Query Initialization Tasks and Objectness-Class Learning for Open World Object Detection2025-07-17Dual LiDAR-Based Traffic Movement Count Estimation at a Signalized Intersection: Deployment, Data Collection, and Preliminary Analysis2025-07-17Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16