TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance...

OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation

Zhening Huang, Xiaoyang Wu, Xi Chen, Hengshuang Zhao, Lei Zhu, Joan Lasenby

2023-09-01Zero-shot 3D Point Cloud Classification3D Open-Vocabulary Object DetectionScene UnderstandingSemantic Segmentation3D Open-Vocabulary Instance SegmentationOpen Vocabulary Object DetectionInstance Segmentationobject-detectionObject Detection
PaperPDFCode(official)

Abstract

In this work, we introduce OpenIns3D, a new 3D-input-only framework for 3D open-vocabulary scene understanding. The OpenIns3D framework employs a "Mask-Snap-Lookup" scheme. The "Mask" module learns class-agnostic mask proposals in 3D point clouds, the "Snap" module generates synthetic scene-level images at multiple scales and leverages 2D vision-language models to extract interesting objects, and the "Lookup" module searches through the outcomes of "Snap" to assign category names to the proposed masks. This approach, yet simple, achieves state-of-the-art performance across a wide range of 3D open-vocabulary tasks, including recognition, object detection, and instance segmentation, on both indoor and outdoor datasets. Moreover, OpenIns3D facilitates effortless switching between different 2D detectors without requiring retraining. When integrated with powerful 2D open-world models, it achieves excellent results in scene understanding tasks. Furthermore, when combined with LLM-powered 2D models, OpenIns3D exhibits an impressive capability to comprehend and process highly complex text queries that demand intricate reasoning and real-world knowledge. Project page: https://zheninghuang.github.io/OpenIns3D/

Results

TaskDatasetMetricValueModel
3D Open-Vocabulary Instance SegmentationScanNet200AP Common14.2OpenIns3D
3D Open-Vocabulary Instance SegmentationScanNet200AP Head19.2OpenIns3D
3D Open-Vocabulary Instance SegmentationScanNet200AP Tail14.2OpenIns3D
3D Open-Vocabulary Instance SegmentationScanNet200AP2523.3OpenIns3D
3D Open-Vocabulary Instance SegmentationScanNet200AP5020.6OpenIns3D
3D Open-Vocabulary Instance SegmentationScanNet200mAP15.9OpenIns3D
3D Open-Vocabulary Instance SegmentationScanNet200AP Common6.5OpenIns3D (3d only)
3D Open-Vocabulary Instance SegmentationScanNet200AP Head16OpenIns3D (3d only)
3D Open-Vocabulary Instance SegmentationScanNet200AP Tail4.2OpenIns3D (3d only)
3D Open-Vocabulary Instance SegmentationScanNet200AP2514.4OpenIns3D (3d only)
3D Open-Vocabulary Instance SegmentationScanNet200AP5010.3OpenIns3D (3d only)
3D Open-Vocabulary Instance SegmentationScanNet200mAP8.8OpenIns3D (3d only)
3D Open-Vocabulary Instance SegmentationReplicamAP21.1OpenIns3D (with rgbd)
3D Open-Vocabulary Instance SegmentationReplicamAP15.4OpenIns3D
3D Open-Vocabulary Instance SegmentationSTPLS3DAP5013.3OPENINS3D
3D Open-Vocabulary Instance SegmentationS3DISAP50 Novel B6/N633OpenIns3D
3D Open-Vocabulary Instance SegmentationS3DISAP50 Novel B8/N437OpenIns3D

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Advancing Complex Wide-Area Scene Understanding with Hierarchical Coresets Selection2025-07-17Argus: Leveraging Multiview Images for Improved 3-D Scene Understanding With Large Language Models2025-07-17City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17