Ayumu Saito, Prachi Kudeshia, Jiju Poovvancheri
Recent advancements in self-supervised learning in the point cloud domain have demonstrated significant potential. However, these methods often suffer from drawbacks, including lengthy pre-training time, the necessity of reconstruction in the input space, or the necessity of additional modalities. In order to address these issues, we introduce Point-JEPA, a joint embedding predictive architecture designed specifically for point cloud data. To this end, we introduce a sequencer that orders point cloud patch embeddings to efficiently compute and utilize their proximity based on the indices during target and context selection. The sequencer also allows shared computations of the patch embeddings' proximity between context and target selection, further improving the efficiency. Experimentally, our method achieves competitive results with state-of-the-art methods while avoiding the reconstruction in the input space or additional modality.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Semantic Segmentation | ShapeNet-Part | Class Average IoU | 85.8 | Point-JEPA |
| Semantic Segmentation | ShapeNet-Part | Instance Average IoU | 83.9 | Point-JEPA |
| Shape Representation Of 3D Point Clouds | ScanObjectNN | Overall Accuracy | 86.6 | Point-JEPA |
| Shape Representation Of 3D Point Clouds | ModelNet40 10-way (20-shot) | Overall Accuracy | 96.4 | Point-JEPA |
| Shape Representation Of 3D Point Clouds | ModelNet40 10-way (20-shot) | Standard Deviation | 2.7 | Point-JEPA |
| Shape Representation Of 3D Point Clouds | ModelNet40 5-way (10-shot) | Overall Accuracy | 97.4 | Point-JEPA |
| Shape Representation Of 3D Point Clouds | ModelNet40 5-way (10-shot) | Standard Deviation | 2.2 | Point-JEPA |
| Shape Representation Of 3D Point Clouds | ModelNet40 10-way (10-shot) | Overall Accuracy | 95 | Point-JEPA |
| Shape Representation Of 3D Point Clouds | ModelNet40 10-way (10-shot) | Standard Deviation | 3.6 | Point-JEPA |
| Shape Representation Of 3D Point Clouds | ModelNet40 5-way (20-shot) | Overall Accuracy | 99.2 | Point-JEPA |
| Shape Representation Of 3D Point Clouds | ModelNet40 5-way (20-shot) | Standard Deviation | 0.8 | Point-JEPA |
| 3D Point Cloud Classification | ScanObjectNN | Overall Accuracy | 86.6 | Point-JEPA |
| 3D Point Cloud Classification | ModelNet40 10-way (20-shot) | Overall Accuracy | 96.4 | Point-JEPA |
| 3D Point Cloud Classification | ModelNet40 10-way (20-shot) | Standard Deviation | 2.7 | Point-JEPA |
| 3D Point Cloud Classification | ModelNet40 5-way (10-shot) | Overall Accuracy | 97.4 | Point-JEPA |
| 3D Point Cloud Classification | ModelNet40 5-way (10-shot) | Standard Deviation | 2.2 | Point-JEPA |
| 3D Point Cloud Classification | ModelNet40 10-way (10-shot) | Overall Accuracy | 95 | Point-JEPA |
| 3D Point Cloud Classification | ModelNet40 10-way (10-shot) | Standard Deviation | 3.6 | Point-JEPA |
| 3D Point Cloud Classification | ModelNet40 5-way (20-shot) | Overall Accuracy | 99.2 | Point-JEPA |
| 3D Point Cloud Classification | ModelNet40 5-way (20-shot) | Standard Deviation | 0.8 | Point-JEPA |
| 10-shot image generation | ShapeNet-Part | Class Average IoU | 85.8 | Point-JEPA |
| 10-shot image generation | ShapeNet-Part | Instance Average IoU | 83.9 | Point-JEPA |
| 3D Point Cloud Reconstruction | ScanObjectNN | Overall Accuracy | 86.6 | Point-JEPA |
| 3D Point Cloud Reconstruction | ModelNet40 10-way (20-shot) | Overall Accuracy | 96.4 | Point-JEPA |
| 3D Point Cloud Reconstruction | ModelNet40 10-way (20-shot) | Standard Deviation | 2.7 | Point-JEPA |
| 3D Point Cloud Reconstruction | ModelNet40 5-way (10-shot) | Overall Accuracy | 97.4 | Point-JEPA |
| 3D Point Cloud Reconstruction | ModelNet40 5-way (10-shot) | Standard Deviation | 2.2 | Point-JEPA |
| 3D Point Cloud Reconstruction | ModelNet40 10-way (10-shot) | Overall Accuracy | 95 | Point-JEPA |
| 3D Point Cloud Reconstruction | ModelNet40 10-way (10-shot) | Standard Deviation | 3.6 | Point-JEPA |
| 3D Point Cloud Reconstruction | ModelNet40 5-way (20-shot) | Overall Accuracy | 99.2 | Point-JEPA |
| 3D Point Cloud Reconstruction | ModelNet40 5-way (20-shot) | Standard Deviation | 0.8 | Point-JEPA |