TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/DINO in the Room: Leveraging 2D Foundation Models for 3D S...

DINO in the Room: Leveraging 2D Foundation Models for 3D Segmentation

Karim Abou Zeid, Kadir Yilmaz, Daan de Geus, Alexander Hermans, David Adrian, Timm Linder, Bastian Leibe

2025-03-24SegmentationSemantic SegmentationPoint Cloud Segmentation3D Semantic SegmentationLIDAR Semantic Segmentation
PaperPDFCode(official)

Abstract

Vision foundation models (VFMs) trained on large-scale image datasets provide high-quality features that have significantly advanced 2D visual recognition. However, their potential in 3D vision remains largely untapped, despite the common availability of 2D images alongside 3D point cloud datasets. While significant research has been dedicated to 2D-3D fusion, recent state-of-the-art 3D methods predominantly focus on 3D data, leaving the integration of VFMs into 3D models underexplored. In this work, we challenge this trend by introducing DITR, a simple yet effective approach that extracts 2D foundation model features, projects them to 3D, and finally injects them into a 3D point cloud segmentation model. DITR achieves state-of-the-art results on both indoor and outdoor 3D semantic segmentation benchmarks. To enable the use of VFMs even when images are unavailable during inference, we further propose to distill 2D foundation models into a 3D backbone as a pretraining task. By initializing the 3D backbone with knowledge distilled from 2D VFMs, we create a strong basis for downstream 3D segmentation tasks, ultimately boosting performance across various datasets.

Results

TaskDatasetMetricValueModel
Semantic SegmentationScanNettest mIoU79.7DITR
Semantic SegmentationScanNetval mIoU80.5DITR
Semantic SegmentationS3DIS Area5mIoU75D-DITR
Semantic SegmentationS3DIS Area5mIoU74.1DITR
Semantic SegmentationScanNet200test mIoU44.9DITR
Semantic SegmentationScanNet200val mIoU41.2DITR
Semantic SegmentationWaymo Open DatasetmIoU73.3DITR
Semantic SegmentationScanNet++Top-1 IoU0.525DITR
Semantic SegmentationScanNet++Top-3 IoU0.762DITR
3D Semantic SegmentationScanNet200test mIoU44.9DITR
3D Semantic SegmentationScanNet200val mIoU41.2DITR
3D Semantic SegmentationWaymo Open DatasetmIoU73.3DITR
3D Semantic SegmentationScanNet++Top-1 IoU0.525DITR
3D Semantic SegmentationScanNet++Top-3 IoU0.762DITR
LIDAR Semantic SegmentationnuScenestest mIoU0.851DITR
LIDAR Semantic SegmentationnuScenesval mIoU0.842DITR
10-shot image generationScanNettest mIoU79.7DITR
10-shot image generationScanNetval mIoU80.5DITR
10-shot image generationS3DIS Area5mIoU75D-DITR
10-shot image generationS3DIS Area5mIoU74.1DITR
10-shot image generationScanNet200test mIoU44.9DITR
10-shot image generationScanNet200val mIoU41.2DITR
10-shot image generationWaymo Open DatasetmIoU73.3DITR
10-shot image generationScanNet++Top-1 IoU0.525DITR
10-shot image generationScanNet++Top-3 IoU0.762DITR

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17