TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/ARKit LabelMaker: A New Scale for Indoor 3D Scene Understa...

ARKit LabelMaker: A New Scale for Indoor 3D Scene Understanding

Guangda Ji, Silvan Weder, Francis Engelmann, Marc Pollefeys, Hermann Blum

2024-10-17CVPR 2025 1Scene UnderstandingSemantic SegmentationImage Generation3D Semantic Segmentation
PaperPDFCode(official)

Abstract

The performance of neural networks scales with both their size and the amount of data they have been trained on. This is shown in both language and image generation. However, this requires scaling-friendly network architectures as well as large-scale datasets. Even though scaling-friendly architectures like transformers have emerged for 3D vision tasks, the GPT-moment of 3D vision remains distant due to the lack of training data. In this paper, we introduce ARKit LabelMaker, the first large-scale, real-world 3D dataset with dense semantic annotations. Specifically, we complement ARKitScenes dataset with dense semantic annotations that are automatically generated at scale. To this end, we extend LabelMaker, a recent automatic annotation pipeline, to serve the needs of large-scale pre-training. This involves extending the pipeline with cutting-edge segmentation models as well as making it robust to the challenges of large-scale processing. Further, we push forward the state-of-the-art performance on ScanNet and ScanNet200 dataset with prevalent 3D semantic segmentation models, demonstrating the efficacy of our generated dataset.

Results

TaskDatasetMetricValueModel
Semantic SegmentationScanNettest mIoU79.8PTv3 ARKit LabelMaker
Semantic SegmentationScanNetval mIoU79.1PTv3 ARKit LabelMaker
Semantic SegmentationScanNet200test mIoU41.4PTv3 ArKitLabelmaker
Semantic SegmentationScanNet200val mIoU40.3PTv3 ArKitLabelmaker
3D Semantic SegmentationScanNet200test mIoU41.4PTv3 ArKitLabelmaker
3D Semantic SegmentationScanNet200val mIoU40.3PTv3 ArKitLabelmaker
10-shot image generationScanNettest mIoU79.8PTv3 ARKit LabelMaker
10-shot image generationScanNetval mIoU79.1PTv3 ARKit LabelMaker
10-shot image generationScanNet200test mIoU41.4PTv3 ArKitLabelmaker
10-shot image generationScanNet200val mIoU40.3PTv3 ArKitLabelmaker

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Advancing Complex Wide-Area Scene Understanding with Hierarchical Coresets Selection2025-07-17Argus: Leveraging Multiview Images for Improved 3-D Scene Understanding With Large Language Models2025-07-17City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17