OpenMask3D: Open-Vocabulary 3D Instance Segmentation

2023-06-23NeurIPS 2023 113D Instance Segmentation Scene Understanding Segmentation Semantic Segmentation 3D Open-Vocabulary Instance Segmentation Instance Segmentation

Paper PDF Code(official)

Abstract

We introduce the task of open-vocabulary 3D instance segmentation. Current approaches for 3D instance segmentation can typically only recognize object categories from a pre-defined closed set of classes that are annotated in the training datasets. This results in important limitations for real-world applications where one might need to perform tasks guided by novel, open-vocabulary queries related to a wide variety of objects. Recently, open-vocabulary 3D scene understanding methods have emerged to address this problem by learning queryable features for each point in the scene. While such a representation can be directly employed to perform semantic segmentation, existing methods cannot separate multiple object instances. In this work, we address this limitation, and propose OpenMask3D, which is a zero-shot approach for open-vocabulary 3D instance segmentation. Guided by predicted class-agnostic 3D instance masks, our model aggregates per-mask features via multi-view fusion of CLIP-based image embeddings. Experiments and ablation studies on ScanNet200 and Replica show that OpenMask3D outperforms other open-vocabulary methods, especially on the long-tail distribution. Qualitative experiments further showcase OpenMask3D's ability to segment object properties based on free-form queries describing geometry, affordances, and materials.

Results

Task	Dataset	Metric	Value	Model
3D Open-Vocabulary Instance Segmentation	ScanNet200	AP Common	14.1	OpenMask3D
3D Open-Vocabulary Instance Segmentation	ScanNet200	AP Head	17.1	OpenMask3D
3D Open-Vocabulary Instance Segmentation	ScanNet200	AP Tail	14.9	OpenMask3D
3D Open-Vocabulary Instance Segmentation	ScanNet200	AP25	23.1	OpenMask3D
3D Open-Vocabulary Instance Segmentation	ScanNet200	AP50	19.9	OpenMask3D
3D Open-Vocabulary Instance Segmentation	ScanNet200	mAP	15.4	OpenMask3D
3D Open-Vocabulary Instance Segmentation	Replica	mAP	13.1	OpenMask3D

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21 Advancing Complex Wide-Area Scene Understanding with Hierarchical Coresets Selection2025-07-17 Argus: Leveraging Multiview Images for Improved 3-D Scene Understanding With Large Language Models2025-07-17 City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17 Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17 DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17 From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17 Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion2025-07-17