TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Learning 3D Representations from 2D Pre-trained Models via...

Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked Autoencoders

Renrui Zhang, Liuhui Wang, Yu Qiao, Peng Gao, Hongsheng Li

2022-12-13CVPR 2023 13D Point Cloud Linear ClassificationFew-Shot 3D Point Cloud Classification3D Point Cloud Classification
PaperPDFCodeCode(official)

Abstract

Pre-training by numerous image data has become de-facto for robust 2D representations. In contrast, due to the expensive data acquisition and annotation, a paucity of large-scale 3D datasets severely hinders the learning for high-quality 3D features. In this paper, we propose an alternative to obtain superior 3D representations from 2D pre-trained models via Image-to-Point Masked Autoencoders, named as I2P-MAE. By self-supervised pre-training, we leverage the well learned 2D knowledge to guide 3D masked autoencoding, which reconstructs the masked point tokens with an encoder-decoder architecture. Specifically, we first utilize off-the-shelf 2D models to extract the multi-view visual features of the input point cloud, and then conduct two types of image-to-point learning schemes on top. For one, we introduce a 2D-guided masking strategy that maintains semantically important point tokens to be visible for the encoder. Compared to random masking, the network can better concentrate on significant 3D structures and recover the masked tokens from key spatial cues. For another, we enforce these visible tokens to reconstruct the corresponding multi-view 2D features after the decoder. This enables the network to effectively inherit high-level 2D semantics learned from rich image data for discriminative 3D modeling. Aided by our image-to-point pre-training, the frozen I2P-MAE, without any fine-tuning, achieves 93.4% accuracy for linear SVM on ModelNet40, competitive to the fully trained results of existing methods. By further fine-tuning on on ScanObjectNN's hardest split, I2P-MAE attains the state-of-the-art 90.11% accuracy, +3.68% to the second-best, demonstrating superior transferable capacity. Code will be available at https://github.com/ZrrSkywalker/I2P-MAE.

Results

TaskDatasetMetricValueModel
Shape Representation Of 3D Point CloudsScanObjectNNOBJ-BG (OA)94.15I2P-MAE (no voting)
Shape Representation Of 3D Point CloudsScanObjectNNOBJ-ONLY (OA)91.57I2P-MAE (no voting)
Shape Representation Of 3D Point CloudsScanObjectNNOverall Accuracy90.11I2P-MAE (no voting)
Shape Representation Of 3D Point CloudsModelNet40 10-way (20-shot)Overall Accuracy95.5I2P-MAE
Shape Representation Of 3D Point CloudsModelNet40 10-way (20-shot)Standard Deviation3I2P-MAE
Shape Representation Of 3D Point CloudsModelNet40 5-way (10-shot)Overall Accuracy97I2P-MAE
Shape Representation Of 3D Point CloudsModelNet40 5-way (10-shot)Standard Deviation1.8I2P-MAE
Shape Representation Of 3D Point CloudsModelNet40 10-way (10-shot)Overall Accuracy92.6I2P-MAE
Shape Representation Of 3D Point CloudsModelNet40 10-way (10-shot)Standard Deviation5I2P-MAE
Shape Representation Of 3D Point CloudsModelNet40 5-way (20-shot)Overall Accuracy98.3I2P-MAE
Shape Representation Of 3D Point CloudsModelNet40 5-way (20-shot)Standard Deviation1.3I2P-MAE
3D Point Cloud ClassificationScanObjectNNOBJ-BG (OA)94.15I2P-MAE (no voting)
3D Point Cloud ClassificationScanObjectNNOBJ-ONLY (OA)91.57I2P-MAE (no voting)
3D Point Cloud ClassificationScanObjectNNOverall Accuracy90.11I2P-MAE (no voting)
3D Point Cloud ClassificationModelNet40 10-way (20-shot)Overall Accuracy95.5I2P-MAE
3D Point Cloud ClassificationModelNet40 10-way (20-shot)Standard Deviation3I2P-MAE
3D Point Cloud ClassificationModelNet40 5-way (10-shot)Overall Accuracy97I2P-MAE
3D Point Cloud ClassificationModelNet40 5-way (10-shot)Standard Deviation1.8I2P-MAE
3D Point Cloud ClassificationModelNet40 10-way (10-shot)Overall Accuracy92.6I2P-MAE
3D Point Cloud ClassificationModelNet40 10-way (10-shot)Standard Deviation5I2P-MAE
3D Point Cloud ClassificationModelNet40 5-way (20-shot)Overall Accuracy98.3I2P-MAE
3D Point Cloud ClassificationModelNet40 5-way (20-shot)Standard Deviation1.3I2P-MAE
3D Point Cloud Linear ClassificationModelNet40Overall Accuracy93.4I2P-MAE
3D Point Cloud ReconstructionScanObjectNNOBJ-BG (OA)94.15I2P-MAE (no voting)
3D Point Cloud ReconstructionScanObjectNNOBJ-ONLY (OA)91.57I2P-MAE (no voting)
3D Point Cloud ReconstructionScanObjectNNOverall Accuracy90.11I2P-MAE (no voting)
3D Point Cloud ReconstructionModelNet40 10-way (20-shot)Overall Accuracy95.5I2P-MAE
3D Point Cloud ReconstructionModelNet40 10-way (20-shot)Standard Deviation3I2P-MAE
3D Point Cloud ReconstructionModelNet40 5-way (10-shot)Overall Accuracy97I2P-MAE
3D Point Cloud ReconstructionModelNet40 5-way (10-shot)Standard Deviation1.8I2P-MAE
3D Point Cloud ReconstructionModelNet40 10-way (10-shot)Overall Accuracy92.6I2P-MAE
3D Point Cloud ReconstructionModelNet40 10-way (10-shot)Standard Deviation5I2P-MAE
3D Point Cloud ReconstructionModelNet40 5-way (20-shot)Overall Accuracy98.3I2P-MAE
3D Point Cloud ReconstructionModelNet40 5-way (20-shot)Standard Deviation1.3I2P-MAE

Related Papers

Asymmetric Dual Self-Distillation for 3D Self-Supervised Representation Learning2025-06-26Rethinking Gradient-based Adversarial Attacks on Point Cloud Classification2025-05-28SMART-PC: Skeletal Model Adaptation for Robust Test-Time Training in Point Clouds2025-05-26DG-MVP: 3D Domain Generalization via Multiple Views of Point Clouds for Classification2025-04-16Introducing the Short-Time Fourier Kolmogorov Arnold Network: A Dynamic Graph CNN Approach for Tree Species Classification in 3D Point Clouds2025-03-31Point-LN: A Lightweight Framework for Efficient Point Cloud Classification Using Non-Parametric Positional Encoding2025-01-24AdaCrossNet: Adaptive Dynamic Loss Weighting for Cross-Modal Contrastive Point Cloud Learning2025-01-02Rethinking Masked Representation Learning for 3D Point Cloud Understanding2024-12-26