TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/SceneGraphFusion: Incremental 3D Scene Graph Prediction fr...

SceneGraphFusion: Incremental 3D Scene Graph Prediction from RGB-D Sequences

Shun-Cheng Wu, Johanna Wald, Keisuke Tateno, Nassir Navab, Federico Tombari

2021-03-27CVPR 2021 1Scene Graph GenerationPanoptic SegmentationScene UnderstandingPredicate Classification3d scene graph generation3D Object Classification
PaperPDFCode(official)Code(official)

Abstract

Scene graphs are a compact and explicit representation successfully used in a variety of 2D scene understanding tasks. This work proposes a method to incrementally build up semantic scene graphs from a 3D environment given a sequence of RGB-D frames. To this end, we aggregate PointNet features from primitive scene components by means of a graph neural network. We also propose a novel attention mechanism well suited for partial and missing graph data present in such an incremental reconstruction scenario. Although our proposed method is designed to run on submaps of the scene, we show it also transfers to entire 3D scenes. Experiments show that our approach outperforms 3D scene graph prediction methods by a large margin and its accuracy is on par with other 3D semantic and panoptic segmentation methods while running at 35 Hz.

Results

TaskDatasetMetricValueModel
Scene Parsing3R-ScanTop-5 Accuracy0.87SceneGraphFusion
Scene Parsing3R-ScanTop-5 Accuracy0.663DSSG [Wald2020_3dssg]
Semantic SegmentationScanNetPQ31.5SceneGraphFusion
Semantic SegmentationScanNetPQ_st43.4SceneGraphFusion
Semantic SegmentationScanNetPQ_th30.2SceneGraphFusion
Semantic SegmentationScanNetV2PQ31.5SceneGraphFusion (NN mapping)
Semantic SegmentationScanNetV2Params (M)2.9SceneGraphFusion (NN mapping)
Semantic SegmentationScanNetV2RQ42.2SceneGraphFusion (NN mapping)
Semantic SegmentationScanNetV2SQ72.9SceneGraphFusion (NN mapping)
3D3R-ScanTop-10 Accuracy0.8SceneGraphFusion
3D3R-ScanTop-5 Accuracy0.7SceneGraphFusion
3D3R-ScanTop-10 Accuracy0.783DSSG [Wald2020_3dssg]
3D3R-ScanTop-5 Accuracy0.683DSSG [Wald2020_3dssg]
Shape Representation Of 3D Point Clouds3R-ScanTop-10 Accuracy0.8SceneGraphFusion
Shape Representation Of 3D Point Clouds3R-ScanTop-5 Accuracy0.7SceneGraphFusion
Shape Representation Of 3D Point Clouds3R-ScanTop-10 Accuracy0.783DSSG [Wald2020_3dssg]
Shape Representation Of 3D Point Clouds3R-ScanTop-5 Accuracy0.683DSSG [Wald2020_3dssg]
3D Object Classification3R-ScanTop-10 Accuracy0.8SceneGraphFusion
3D Object Classification3R-ScanTop-5 Accuracy0.7SceneGraphFusion
3D Object Classification3R-ScanTop-10 Accuracy0.783DSSG [Wald2020_3dssg]
3D Object Classification3R-ScanTop-5 Accuracy0.683DSSG [Wald2020_3dssg]
3D Point Cloud Classification3R-ScanTop-10 Accuracy0.8SceneGraphFusion
3D Point Cloud Classification3R-ScanTop-5 Accuracy0.7SceneGraphFusion
3D Point Cloud Classification3R-ScanTop-10 Accuracy0.783DSSG [Wald2020_3dssg]
3D Point Cloud Classification3R-ScanTop-5 Accuracy0.683DSSG [Wald2020_3dssg]
3D Classification3R-ScanTop-10 Accuracy0.8SceneGraphFusion
3D Classification3R-ScanTop-5 Accuracy0.7SceneGraphFusion
3D Classification3R-ScanTop-10 Accuracy0.783DSSG [Wald2020_3dssg]
3D Classification3R-ScanTop-5 Accuracy0.683DSSG [Wald2020_3dssg]
2D Semantic Segmentation3R-ScanTop-5 Accuracy0.87SceneGraphFusion
2D Semantic Segmentation3R-ScanTop-5 Accuracy0.663DSSG [Wald2020_3dssg]
Scene Graph Generation3R-ScanTop-5 Accuracy0.87SceneGraphFusion
Scene Graph Generation3R-ScanTop-5 Accuracy0.663DSSG [Wald2020_3dssg]
10-shot image generationScanNetPQ31.5SceneGraphFusion
10-shot image generationScanNetPQ_st43.4SceneGraphFusion
10-shot image generationScanNetPQ_th30.2SceneGraphFusion
10-shot image generationScanNetV2PQ31.5SceneGraphFusion (NN mapping)
10-shot image generationScanNetV2Params (M)2.9SceneGraphFusion (NN mapping)
10-shot image generationScanNetV2RQ42.2SceneGraphFusion (NN mapping)
10-shot image generationScanNetV2SQ72.9SceneGraphFusion (NN mapping)
Panoptic SegmentationScanNetPQ31.5SceneGraphFusion
Panoptic SegmentationScanNetPQ_st43.4SceneGraphFusion
Panoptic SegmentationScanNetPQ_th30.2SceneGraphFusion
Panoptic SegmentationScanNetV2PQ31.5SceneGraphFusion (NN mapping)
Panoptic SegmentationScanNetV2Params (M)2.9SceneGraphFusion (NN mapping)
Panoptic SegmentationScanNetV2RQ42.2SceneGraphFusion (NN mapping)
Panoptic SegmentationScanNetV2SQ72.9SceneGraphFusion (NN mapping)
3D Point Cloud Reconstruction3R-ScanTop-10 Accuracy0.8SceneGraphFusion
3D Point Cloud Reconstruction3R-ScanTop-5 Accuracy0.7SceneGraphFusion
3D Point Cloud Reconstruction3R-ScanTop-10 Accuracy0.783DSSG [Wald2020_3dssg]
3D Point Cloud Reconstruction3R-ScanTop-5 Accuracy0.683DSSG [Wald2020_3dssg]

Related Papers

Advancing Complex Wide-Area Scene Understanding with Hierarchical Coresets Selection2025-07-17Argus: Leveraging Multiview Images for Improved 3-D Scene Understanding With Large Language Models2025-07-17City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17Learning to Tune Like an Expert: Interpretable and Scene-Aware Navigation via MLLM Reasoning and CVAE-Based Adaptation2025-07-15Tactical Decision for Multi-UGV Confrontation with a Vision-Language Model-Based Commander2025-07-15Seeing the Signs: A Survey of Edge-Deployable OCR Models for Billboard Visibility Analysis2025-07-15DEARLi: Decoupled Enhancement of Recognition and Localization for Semi-supervised Panoptic Segmentation2025-07-14EmbRACE-3K: Embodied Reasoning and Action in Complex Environments2025-07-14