Guangyan Chen, Meiling Wang, Yi Yang, Kai Yu, Li Yuan, Yufeng Yue
Large language models (LLMs) based on the generative pre-training transformer (GPT) have demonstrated remarkable effectiveness across a diverse range of downstream tasks. Inspired by the advancements of the GPT, we present PointGPT, a novel approach that extends the concept of GPT to point clouds, addressing the challenges associated with disorder properties, low information density, and task gaps. Specifically, a point cloud auto-regressive generation task is proposed to pre-train transformer models. Our method partitions the input point cloud into multiple point patches and arranges them in an ordered sequence based on their spatial proximity. Then, an extractor-generator based transformer decoder, with a dual masking strategy, learns latent representations conditioned on the preceding point patches, aiming to predict the next one in an auto-regressive manner. Our scalable approach allows for learning high-capacity models that generalize well, achieving state-of-the-art performance on various downstream tasks. In particular, our approach achieves classification accuracies of 94.9% on the ModelNet40 dataset and 93.4% on the ScanObjectNN dataset, outperforming all other transformer models. Furthermore, our method also attains new state-of-the-art accuracies on all four few-shot learning benchmarks.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Shape Representation Of 3D Point Clouds | ScanObjectNN | OBJ-BG (OA) | 97.2 | PointGPT |
| Shape Representation Of 3D Point Clouds | ScanObjectNN | OBJ-ONLY (OA) | 96.6 | PointGPT |
| Shape Representation Of 3D Point Clouds | ScanObjectNN | Overall Accuracy | 93.4 | PointGPT |
| Shape Representation Of 3D Point Clouds | ModelNet40 10-way (20-shot) | Overall Accuracy | 96.1 | PointGPT |
| Shape Representation Of 3D Point Clouds | ModelNet40 10-way (20-shot) | Standard Deviation | 2.8 | PointGPT |
| Shape Representation Of 3D Point Clouds | ModelNet40 5-way (10-shot) | Overall Accuracy | 98 | PointGPT |
| Shape Representation Of 3D Point Clouds | ModelNet40 5-way (10-shot) | Standard Deviation | 1.9 | PointGPT |
| Shape Representation Of 3D Point Clouds | ModelNet40 10-way (10-shot) | Overall Accuracy | 94.3 | PointGPT |
| Shape Representation Of 3D Point Clouds | ModelNet40 10-way (10-shot) | Standard Deviation | 3.3 | PointGPT |
| Shape Representation Of 3D Point Clouds | ModelNet40 5-way (20-shot) | Overall Accuracy | 99 | PointGPT |
| Shape Representation Of 3D Point Clouds | ModelNet40 5-way (20-shot) | Standard Deviation | 1 | PointGPT |
| 3D Point Cloud Classification | ScanObjectNN | OBJ-BG (OA) | 97.2 | PointGPT |
| 3D Point Cloud Classification | ScanObjectNN | OBJ-ONLY (OA) | 96.6 | PointGPT |
| 3D Point Cloud Classification | ScanObjectNN | Overall Accuracy | 93.4 | PointGPT |
| 3D Point Cloud Classification | ModelNet40 10-way (20-shot) | Overall Accuracy | 96.1 | PointGPT |
| 3D Point Cloud Classification | ModelNet40 10-way (20-shot) | Standard Deviation | 2.8 | PointGPT |
| 3D Point Cloud Classification | ModelNet40 5-way (10-shot) | Overall Accuracy | 98 | PointGPT |
| 3D Point Cloud Classification | ModelNet40 5-way (10-shot) | Standard Deviation | 1.9 | PointGPT |
| 3D Point Cloud Classification | ModelNet40 10-way (10-shot) | Overall Accuracy | 94.3 | PointGPT |
| 3D Point Cloud Classification | ModelNet40 10-way (10-shot) | Standard Deviation | 3.3 | PointGPT |
| 3D Point Cloud Classification | ModelNet40 5-way (20-shot) | Overall Accuracy | 99 | PointGPT |
| 3D Point Cloud Classification | ModelNet40 5-way (20-shot) | Standard Deviation | 1 | PointGPT |
| 3D Point Cloud Reconstruction | ScanObjectNN | OBJ-BG (OA) | 97.2 | PointGPT |
| 3D Point Cloud Reconstruction | ScanObjectNN | OBJ-ONLY (OA) | 96.6 | PointGPT |
| 3D Point Cloud Reconstruction | ScanObjectNN | Overall Accuracy | 93.4 | PointGPT |
| 3D Point Cloud Reconstruction | ModelNet40 10-way (20-shot) | Overall Accuracy | 96.1 | PointGPT |
| 3D Point Cloud Reconstruction | ModelNet40 10-way (20-shot) | Standard Deviation | 2.8 | PointGPT |
| 3D Point Cloud Reconstruction | ModelNet40 5-way (10-shot) | Overall Accuracy | 98 | PointGPT |
| 3D Point Cloud Reconstruction | ModelNet40 5-way (10-shot) | Standard Deviation | 1.9 | PointGPT |
| 3D Point Cloud Reconstruction | ModelNet40 10-way (10-shot) | Overall Accuracy | 94.3 | PointGPT |
| 3D Point Cloud Reconstruction | ModelNet40 10-way (10-shot) | Standard Deviation | 3.3 | PointGPT |
| 3D Point Cloud Reconstruction | ModelNet40 5-way (20-shot) | Overall Accuracy | 99 | PointGPT |
| 3D Point Cloud Reconstruction | ModelNet40 5-way (20-shot) | Standard Deviation | 1 | PointGPT |