TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/3D-JEPA: A Joint Embedding Predictive Architecture for 3D ...

3D-JEPA: A Joint Embedding Predictive Architecture for 3D Self-Supervised Representation Learning

Naiwen Hu, Haozhe Cheng, Yifan Xie, Shiqi Li, Jihua Zhu

2024-09-24Representation LearningFew-Shot 3D Point Cloud Classification3D Part Segmentation3D Point Cloud Classification
PaperPDF

Abstract

Invariance-based and generative methods have shown a conspicuous performance for 3D self-supervised representation learning (SSRL). However, the former relies on hand-crafted data augmentations that introduce bias not universally applicable to all downstream tasks, and the latter indiscriminately reconstructs masked regions, resulting in irrelevant details being saved in the representation space. To solve the problem above, we introduce 3D-JEPA, a novel non-generative 3D SSRL framework. Specifically, we propose a multi-block sampling strategy that produces a sufficiently informative context block and several representative target blocks. We present the context-aware decoder to enhance the reconstruction of the target blocks. Concretely, the context information is fed to the decoder continuously, facilitating the encoder in learning semantic modeling rather than memorizing the context information related to target blocks. Overall, 3D-JEPA predicts the representation of target blocks from a context block using the encoder and context-aware decoder architecture. Various downstream tasks on different datasets demonstrate 3D-JEPA's effectiveness and efficiency, achieving higher accuracy with fewer pretraining epochs, e.g., 88.65% accuracy on PB_T50_RS with 150 pretraining epochs.

Results

TaskDatasetMetricValueModel
Semantic SegmentationShapeNet-PartClass Average IoU86.413D-JEPA
Semantic SegmentationShapeNet-PartInstance Average IoU84.933D-JEPA
Shape Representation Of 3D Point CloudsScanObjectNNOBJ-BG (OA)93.633D-JEPA
Shape Representation Of 3D Point CloudsScanObjectNNOBJ-ONLY (OA)94.493D-JEPA
Shape Representation Of 3D Point CloudsScanObjectNNOverall Accuracy89.523D-JEPA
Shape Representation Of 3D Point CloudsModelNet40Overall Accuracy943D-JEPA
Shape Representation Of 3D Point CloudsModelNet40 10-way (20-shot)Overall Accuracy96.33D-JEPA
Shape Representation Of 3D Point CloudsModelNet40 10-way (20-shot)Standard Deviation2.43D-JEPA
Shape Representation Of 3D Point CloudsModelNet40 5-way (10-shot)Overall Accuracy97.63D-JEPA
Shape Representation Of 3D Point CloudsModelNet40 5-way (10-shot)Standard Deviation23D-JEPA
Shape Representation Of 3D Point CloudsModelNet40 10-way (10-shot)Overall Accuracy94.33D-JEPA
Shape Representation Of 3D Point CloudsModelNet40 10-way (10-shot)Standard Deviation3.63D-JEPA
Shape Representation Of 3D Point CloudsModelNet40 5-way (20-shot)Overall Accuracy98.83D-JEPA
Shape Representation Of 3D Point CloudsModelNet40 5-way (20-shot)Standard Deviation0.43D-JEPA
3D Point Cloud ClassificationScanObjectNNOBJ-BG (OA)93.633D-JEPA
3D Point Cloud ClassificationScanObjectNNOBJ-ONLY (OA)94.493D-JEPA
3D Point Cloud ClassificationScanObjectNNOverall Accuracy89.523D-JEPA
3D Point Cloud ClassificationModelNet40Overall Accuracy943D-JEPA
3D Point Cloud ClassificationModelNet40 10-way (20-shot)Overall Accuracy96.33D-JEPA
3D Point Cloud ClassificationModelNet40 10-way (20-shot)Standard Deviation2.43D-JEPA
3D Point Cloud ClassificationModelNet40 5-way (10-shot)Overall Accuracy97.63D-JEPA
3D Point Cloud ClassificationModelNet40 5-way (10-shot)Standard Deviation23D-JEPA
3D Point Cloud ClassificationModelNet40 10-way (10-shot)Overall Accuracy94.33D-JEPA
3D Point Cloud ClassificationModelNet40 10-way (10-shot)Standard Deviation3.63D-JEPA
3D Point Cloud ClassificationModelNet40 5-way (20-shot)Overall Accuracy98.83D-JEPA
3D Point Cloud ClassificationModelNet40 5-way (20-shot)Standard Deviation0.43D-JEPA
10-shot image generationShapeNet-PartClass Average IoU86.413D-JEPA
10-shot image generationShapeNet-PartInstance Average IoU84.933D-JEPA
3D Point Cloud ReconstructionScanObjectNNOBJ-BG (OA)93.633D-JEPA
3D Point Cloud ReconstructionScanObjectNNOBJ-ONLY (OA)94.493D-JEPA
3D Point Cloud ReconstructionScanObjectNNOverall Accuracy89.523D-JEPA
3D Point Cloud ReconstructionModelNet40Overall Accuracy943D-JEPA
3D Point Cloud ReconstructionModelNet40 10-way (20-shot)Overall Accuracy96.33D-JEPA
3D Point Cloud ReconstructionModelNet40 10-way (20-shot)Standard Deviation2.43D-JEPA
3D Point Cloud ReconstructionModelNet40 5-way (10-shot)Overall Accuracy97.63D-JEPA
3D Point Cloud ReconstructionModelNet40 5-way (10-shot)Standard Deviation23D-JEPA
3D Point Cloud ReconstructionModelNet40 10-way (10-shot)Overall Accuracy94.33D-JEPA
3D Point Cloud ReconstructionModelNet40 10-way (10-shot)Standard Deviation3.63D-JEPA
3D Point Cloud ReconstructionModelNet40 5-way (20-shot)Overall Accuracy98.83D-JEPA
3D Point Cloud ReconstructionModelNet40 5-way (20-shot)Standard Deviation0.43D-JEPA

Related Papers

Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper2025-07-20Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17Boosting Team Modeling through Tempo-Relational Representation Learning2025-07-17Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16Are encoders able to learn landmarkers for warm-starting of Hyperparameter Optimization?2025-07-16Language-Guided Contrastive Audio-Visual Masked Autoencoder with Automatically Generated Audio-Visual-Text Triplets from Videos2025-07-16A Mixed-Primitive-based Gaussian Splatting Method for Surface Reconstruction2025-07-15Dual Dimensions Geometric Representation Learning Based Document Dewarping2025-07-11