TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/PonderV2: Pave the Way for 3D Foundation Model with A Univ...

PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm

Haoyi Zhu, Honghui Yang, Xiaoyang Wu, Di Huang, Sha Zhang, Xianglong He, Hengshuang Zhao, Chunhua Shen, Yu Qiao, Tong He, Wanli Ouyang

2023-10-12Neural RenderingSemantic Segmentation3D ReconstructionImage Generation3D Semantic Segmentation3D Object DetectionLIDAR Semantic Segmentation
PaperPDFCode(official)

Abstract

In contrast to numerous NLP and 2D vision foundational models, learning a 3D foundational model poses considerably greater challenges. This is primarily due to the inherent data variability and diversity of downstream tasks. In this paper, we introduce a novel universal 3D pre-training framework designed to facilitate the acquisition of efficient 3D representation, thereby establishing a pathway to 3D foundational models. Considering that informative 3D features should encode rich geometry and appearance cues that can be utilized to render realistic images, we propose to learn 3D representations by differentiable neural rendering. We train a 3D backbone with a devised volumetric neural renderer by comparing the rendered with the real images. Notably, our approach seamlessly integrates the learned 3D encoder into various downstream tasks. These tasks encompass not only high-level challenges such as 3D detection and segmentation but also low-level objectives like 3D reconstruction and image synthesis, spanning both indoor and outdoor scenarios. Besides, we also illustrate the capability of pre-training a 2D backbone using the proposed methodology, surpassing conventional pre-training methods by a large margin. For the first time, PonderV2 achieves state-of-the-art performance on 11 indoor and outdoor benchmarks, implying its effectiveness. Code and models are available at https://github.com/OpenGVLab/PonderV2.

Results

TaskDatasetMetricValueModel
Semantic SegmentationScanNettest mIoU78.5PonderV2 + SparseUNet
Semantic SegmentationScanNetval mIoU77PonderV2 + SparseUNet
Semantic SegmentationS3DIS Area5mAcc79PonderV2 + SparseUNet
Semantic SegmentationS3DIS Area5mIoU73.2PonderV2 + SparseUNet
Semantic SegmentationS3DIS Area5oAcc92.2PonderV2 + SparseUNet
Semantic SegmentationS3DISMean IoU79.9PonderV2 + SparseUNet
Semantic SegmentationS3DISmAcc86.5PonderV2 + SparseUNet
Semantic SegmentationS3DISoAcc92.5PonderV2 + SparseUNet
Semantic SegmentationScanNet200test mIoU34.6PonderV2 + SparseUNet
Semantic SegmentationScanNet200val mIoU32.3PonderV2 + SparseUNet
3D Semantic SegmentationScanNet200test mIoU34.6PonderV2 + SparseUNet
3D Semantic SegmentationScanNet200val mIoU32.3PonderV2 + SparseUNet
10-shot image generationScanNettest mIoU78.5PonderV2 + SparseUNet
10-shot image generationScanNetval mIoU77PonderV2 + SparseUNet
10-shot image generationS3DIS Area5mAcc79PonderV2 + SparseUNet
10-shot image generationS3DIS Area5mIoU73.2PonderV2 + SparseUNet
10-shot image generationS3DIS Area5oAcc92.2PonderV2 + SparseUNet
10-shot image generationS3DISMean IoU79.9PonderV2 + SparseUNet
10-shot image generationS3DISmAcc86.5PonderV2 + SparseUNet
10-shot image generationS3DISoAcc92.5PonderV2 + SparseUNet
10-shot image generationScanNet200test mIoU34.6PonderV2 + SparseUNet
10-shot image generationScanNet200val mIoU32.3PonderV2 + SparseUNet

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17AutoPartGen: Autogressive 3D Part Generation and Discovery2025-07-17fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17Synthesizing Reality: Leveraging the Generative AI-Powered Platform Midjourney for Construction Worker Detection2025-07-17