Guocheng Qian, Yuchen Li, Houwen Peng, Jinjie Mai, Hasan Abed Al Kader Hammoud, Mohamed Elhoseiny, Bernard Ghanem
PointNet++ is one of the most influential neural architectures for point cloud understanding. Although the accuracy of PointNet++ has been largely surpassed by recent networks such as PointMLP and Point Transformer, we find that a large portion of the performance gain is due to improved training strategies, i.e. data augmentation and optimization techniques, and increased model sizes rather than architectural innovations. Thus, the full potential of PointNet++ has yet to be explored. In this work, we revisit the classical PointNet++ through a systematic study of model training and scaling strategies, and offer two major contributions. First, we propose a set of improved training strategies that significantly improve PointNet++ performance. For example, we show that, without any change in architecture, the overall accuracy (OA) of PointNet++ on ScanObjectNN object classification can be raised from 77.9% to 86.1%, even outperforming state-of-the-art PointMLP. Second, we introduce an inverted residual bottleneck design and separable MLPs into PointNet++ to enable efficient and effective model scaling and propose PointNeXt, the next version of PointNets. PointNeXt can be flexibly scaled up and outperforms state-of-the-art methods on both 3D classification and segmentation tasks. For classification, PointNeXt reaches an overall accuracy of 87.7 on ScanObjectNN, surpassing PointMLP by 2.3%, while being 10x faster in inference. For semantic segmentation, PointNeXt establishes a new state-of-the-art performance with 74.9% mean IoU on S3DIS (6-fold cross-validation), being superior to the recent Point Transformer. The code and models are available at https://github.com/guochengqian/pointnext.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Semantic Segmentation | S3DIS Area5 | mAcc | 77.2 | PointNeXt |
| Semantic Segmentation | S3DIS Area5 | mIoU | 71.1 | PointNeXt |
| Semantic Segmentation | S3DIS Area5 | oAcc | 91 | PointNeXt |
| Semantic Segmentation | S3DIS | Mean IoU | 74.9 | PointNeXt-XL |
| Semantic Segmentation | S3DIS | Params (M) | 41.6 | PointNeXt-XL |
| Semantic Segmentation | S3DIS | mAcc | 83 | PointNeXt-XL |
| Semantic Segmentation | S3DIS | oAcc | 90.3 | PointNeXt-XL |
| Semantic Segmentation | S3DIS | Mean IoU | 73.9 | PointNeXt-L |
| Semantic Segmentation | S3DIS | Params (M) | 7.1 | PointNeXt-L |
| Semantic Segmentation | S3DIS | mAcc | 82.2 | PointNeXt-L |
| Semantic Segmentation | S3DIS | oAcc | 89.9 | PointNeXt-L |
| Semantic Segmentation | OpenTrench3D | mAcc | 79.7 | PointNeXt-XL |
| Semantic Segmentation | OpenTrench3D | mIoU | 70.6 | PointNeXt-XL |
| Semantic Segmentation | S3DIS | mIoU (6-Fold) | 74.9 | PointNext |
| Semantic Segmentation | S3DIS | mIoU (Area-5) | 70.5 | PointNext |
| Semantic Segmentation | ShapeNet-Part | Class Average IoU | 85.2 | PointNeXt |
| Semantic Segmentation | ShapeNet-Part | Instance Average IoU | 87.1 | PointNeXt |
| Shape Representation Of 3D Point Clouds | ScanObjectNN | Mean Accuracy | 86.8 | PointNeXt |
| Shape Representation Of 3D Point Clouds | ScanObjectNN | Overall Accuracy | 88.2 | PointNeXt |
| Shape Representation Of 3D Point Clouds | ModelNet40 | Mean Accuracy | 91.1 | PointNeXt |
| Shape Representation Of 3D Point Clouds | ModelNet40 | Overall Accuracy | 94 | PointNeXt |
| Shape Representation Of 3D Point Clouds | ScanObjectNN | GFLOPs | 3.6 | PointNeXt |
| Shape Representation Of 3D Point Clouds | ScanObjectNN | Number of params (M) | 1.4 | PointNeXt |
| Shape Representation Of 3D Point Clouds | ScanObjectNN | Overall Accuracy (PB_T50_RS) | 87.8 | PointNeXt |
| 3D Semantic Segmentation | OpenTrench3D | mAcc | 79.7 | PointNeXt-XL |
| 3D Semantic Segmentation | OpenTrench3D | mIoU | 70.6 | PointNeXt-XL |
| 3D Semantic Segmentation | S3DIS | mIoU (6-Fold) | 74.9 | PointNext |
| 3D Semantic Segmentation | S3DIS | mIoU (Area-5) | 70.5 | PointNext |
| 3D Point Cloud Classification | ScanObjectNN | Mean Accuracy | 86.8 | PointNeXt |
| 3D Point Cloud Classification | ScanObjectNN | Overall Accuracy | 88.2 | PointNeXt |
| 3D Point Cloud Classification | ModelNet40 | Mean Accuracy | 91.1 | PointNeXt |
| 3D Point Cloud Classification | ModelNet40 | Overall Accuracy | 94 | PointNeXt |
| 3D Point Cloud Classification | ScanObjectNN | GFLOPs | 3.6 | PointNeXt |
| 3D Point Cloud Classification | ScanObjectNN | Number of params (M) | 1.4 | PointNeXt |
| 3D Point Cloud Classification | ScanObjectNN | Overall Accuracy (PB_T50_RS) | 87.8 | PointNeXt |
| 10-shot image generation | S3DIS Area5 | mAcc | 77.2 | PointNeXt |
| 10-shot image generation | S3DIS Area5 | mIoU | 71.1 | PointNeXt |
| 10-shot image generation | S3DIS Area5 | oAcc | 91 | PointNeXt |
| 10-shot image generation | S3DIS | Mean IoU | 74.9 | PointNeXt-XL |
| 10-shot image generation | S3DIS | Params (M) | 41.6 | PointNeXt-XL |
| 10-shot image generation | S3DIS | mAcc | 83 | PointNeXt-XL |
| 10-shot image generation | S3DIS | oAcc | 90.3 | PointNeXt-XL |
| 10-shot image generation | S3DIS | Mean IoU | 73.9 | PointNeXt-L |
| 10-shot image generation | S3DIS | Params (M) | 7.1 | PointNeXt-L |
| 10-shot image generation | S3DIS | mAcc | 82.2 | PointNeXt-L |
| 10-shot image generation | S3DIS | oAcc | 89.9 | PointNeXt-L |
| 10-shot image generation | OpenTrench3D | mAcc | 79.7 | PointNeXt-XL |
| 10-shot image generation | OpenTrench3D | mIoU | 70.6 | PointNeXt-XL |
| 10-shot image generation | S3DIS | mIoU (6-Fold) | 74.9 | PointNext |
| 10-shot image generation | S3DIS | mIoU (Area-5) | 70.5 | PointNext |
| 10-shot image generation | ShapeNet-Part | Class Average IoU | 85.2 | PointNeXt |
| 10-shot image generation | ShapeNet-Part | Instance Average IoU | 87.1 | PointNeXt |
| 3D Point Cloud Reconstruction | ScanObjectNN | Mean Accuracy | 86.8 | PointNeXt |
| 3D Point Cloud Reconstruction | ScanObjectNN | Overall Accuracy | 88.2 | PointNeXt |
| 3D Point Cloud Reconstruction | ModelNet40 | Mean Accuracy | 91.1 | PointNeXt |
| 3D Point Cloud Reconstruction | ModelNet40 | Overall Accuracy | 94 | PointNeXt |
| 3D Point Cloud Reconstruction | ScanObjectNN | GFLOPs | 3.6 | PointNeXt |
| 3D Point Cloud Reconstruction | ScanObjectNN | Number of params (M) | 1.4 | PointNeXt |
| 3D Point Cloud Reconstruction | ScanObjectNN | Overall Accuracy (PB_T50_RS) | 87.8 | PointNeXt |