TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Take-A-Photo: 3D-to-2D Generative Pre-training of Point Cl...

Take-A-Photo: 3D-to-2D Generative Pre-training of Point Cloud Models

Ziyi Wang, Xumin Yu, Yongming Rao, Jie zhou, Jiwen Lu

2023-07-27ICCV 2023 13D Part Segmentation3D Point Cloud Classification
PaperPDFCode(official)

Abstract

With the overwhelming trend of mask image modeling led by MAE, generative pre-training has shown a remarkable potential to boost the performance of fundamental models in 2D vision. However, in 3D vision, the over-reliance on Transformer-based backbones and the unordered nature of point clouds have restricted the further development of generative pre-training. In this paper, we propose a novel 3D-to-2D generative pre-training method that is adaptable to any point cloud model. We propose to generate view images from different instructed poses via the cross-attention mechanism as the pre-training scheme. Generating view images has more precise supervision than its point cloud counterpart, thus assisting 3D backbones to have a finer comprehension of the geometrical structure and stereoscopic relations of the point cloud. Experimental results have proved the superiority of our proposed 3D-to-2D generative pre-training over previous pre-training methods. Our method is also effective in boosting the performance of architecture-oriented approaches, achieving state-of-the-art performance when fine-tuning on ScanObjectNN classification and ShapeNetPart segmentation tasks. Code is available at https://github.com/wangzy22/TAP.

Results

TaskDatasetMetricValueModel
Semantic SegmentationShapeNet-PartClass Average IoU85.2PointMLP+TAP
Semantic SegmentationShapeNet-PartInstance Average IoU86.9PointMLP+TAP
Shape Representation Of 3D Point CloudsScanObjectNNOverall Accuracy88.5PointMLP+TAP
3D Point Cloud ClassificationScanObjectNNOverall Accuracy88.5PointMLP+TAP
10-shot image generationShapeNet-PartClass Average IoU85.2PointMLP+TAP
10-shot image generationShapeNet-PartInstance Average IoU86.9PointMLP+TAP
3D Point Cloud ReconstructionScanObjectNNOverall Accuracy88.5PointMLP+TAP

Related Papers

Asymmetric Dual Self-Distillation for 3D Self-Supervised Representation Learning2025-06-26Rethinking Gradient-based Adversarial Attacks on Point Cloud Classification2025-05-28SMART-PC: Skeletal Model Adaptation for Robust Test-Time Training in Point Clouds2025-05-26DG-MVP: 3D Domain Generalization via Multiple Views of Point Clouds for Classification2025-04-16HoloPart: Generative 3D Part Amodal Segmentation2025-04-10Introducing the Short-Time Fourier Kolmogorov Arnold Network: A Dynamic Graph CNN Approach for Tree Species Classification in 3D Point Clouds2025-03-31Open-Vocabulary Semantic Part Segmentation of 3D Human2025-02-27Point-LN: A Lightweight Framework for Efficient Point Cloud Classification Using Non-Parametric Positional Encoding2025-01-24