TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/ODIN: A Single Model for 2D and 3D Segmentation

ODIN: A Single Model for 2D and 3D Segmentation

Ayush Jain, Pushkal Katara, Nikolaos Gkanatsios, Adam W. Harley, Gabriel Sarch, Kriti Aggarwal, Vishrav Chaudhary, Katerina Fragkiadaki

2024-01-04CVPR 2024 13D Instance SegmentationSegmentationSemantic SegmentationInstance Segmentation3D Semantic Segmentation
PaperPDFCode(official)

Abstract

State-of-the-art models on contemporary 3D segmentation benchmarks like ScanNet consume and label dataset-provided 3D point clouds, obtained through post processing of sensed multiview RGB-D images. They are typically trained in-domain, forego large-scale 2D pre-training and outperform alternatives that featurize the posed RGB-D multiview images instead. The gap in performance between methods that consume posed images versus post-processed 3D point clouds has fueled the belief that 2D and 3D perception require distinct model architectures. In this paper, we challenge this view and propose ODIN (Omni-Dimensional INstance segmentation), a model that can segment and label both 2D RGB images and 3D point clouds, using a transformer architecture that alternates between 2D within-view and 3D cross-view information fusion. Our model differentiates 2D and 3D feature operations through the positional encodings of the tokens involved, which capture pixel coordinates for 2D patch tokens and 3D coordinates for 3D feature tokens. ODIN achieves state-of-the-art performance on ScanNet200, Matterport3D and AI2THOR 3D instance segmentation benchmarks, and competitive performance on ScanNet, S3DIS and COCO. It outperforms all previous works by a wide margin when the sensed 3D point cloud is used in place of the point cloud sampled from 3D mesh. When used as the 3D perception engine in an instructable embodied agent architecture, it sets a new state-of-the-art on the TEACh action-from-dialogue benchmark. Our code and checkpoints can be found at the project website (https://odin-seg.github.io).

Results

TaskDatasetMetricValueModel
Semantic SegmentationScanNettest mIoU74.4ODIN
Semantic SegmentationScanNetval mIoU77.8ODIN
Semantic SegmentationScanNet200test mIoU36.8ODIN
Semantic SegmentationScanNet200val mIoU40.5ODIN
Instance SegmentationScanNet(v2)mAP50ODIN
Instance SegmentationScanNet(v2)mAP @ 5071ODIN
Instance SegmentationScanNet(v2)mAP@2583.6ODIN
Instance SegmentationScanNet200mAP31.5ODIN
Instance SegmentationScanNet200mAP@2553.1ODIN
Instance SegmentationScanNet200mAP@5045.3ODIN
3D Semantic SegmentationScanNet200test mIoU36.8ODIN
3D Semantic SegmentationScanNet200val mIoU40.5ODIN
10-shot image generationScanNettest mIoU74.4ODIN
10-shot image generationScanNetval mIoU77.8ODIN
10-shot image generationScanNet200test mIoU36.8ODIN
10-shot image generationScanNet200val mIoU40.5ODIN
3D Instance SegmentationScanNet(v2)mAP50ODIN
3D Instance SegmentationScanNet(v2)mAP @ 5071ODIN
3D Instance SegmentationScanNet(v2)mAP@2583.6ODIN
3D Instance SegmentationScanNet200mAP31.5ODIN
3D Instance SegmentationScanNet200mAP@2553.1ODIN
3D Instance SegmentationScanNet200mAP@5045.3ODIN

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17