TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Point Transformer

Point Transformer

Hengshuang Zhao, Li Jiang, Jiaya Jia, Philip Torr, Vladlen Koltun

2020-12-16ICCV 2021 10Image ClassificationScene SegmentationSegmentationSemantic SegmentationPoint Cloud SegmentationGeneral Classification3D Semantic Segmentation3D Part Segmentation3D Point Cloud ClassificationObject Detection
PaperPDFCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCode(official)CodeCodeCodeCode

Abstract

Self-attention networks have revolutionized natural language processing and are making impressive strides in image analysis tasks such as image classification and object detection. Inspired by this success, we investigate the application of self-attention networks to 3D point cloud processing. We design self-attention layers for point clouds and use these to construct self-attention networks for tasks such as semantic scene segmentation, object part segmentation, and object classification. Our Point Transformer design improves upon prior work across domains and tasks. For example, on the challenging S3DIS dataset for large-scale semantic scene segmentation, the Point Transformer attains an mIoU of 70.4% on Area 5, outperforming the strongest prior model by 3.3 absolute percentage points and crossing the 70% mIoU threshold for the first time.

Results

TaskDatasetMetricValueModel
Semantic SegmentationS3DIS Area5mAcc76.5PointTransformer
Semantic SegmentationS3DIS Area5mIoU70.4PointTransformer
Semantic SegmentationS3DIS Area5oAcc90.8PointTransformer
Semantic SegmentationS3DIS Area5mIoU57.3PointCNN
Semantic SegmentationS3DIS Area5mIoU41.1PointNet
Semantic SegmentationS3DISMean IoU73.5PointTransformer
Semantic SegmentationS3DISParams (M)7.8PointTransformer
Semantic SegmentationS3DISmAcc81.9PointTransformer
Semantic SegmentationS3DISoAcc90.2PointTransformer
Semantic SegmentationS3DISMean IoU70.6KPConv
Semantic SegmentationS3DISParams (M)14.1KPConv
Semantic SegmentationS3DISMean IoU70.6KPConv
Semantic SegmentationS3DISParams (M)14.1KPConv
Semantic SegmentationS3DISMean IoU65.4PointCNN
Semantic SegmentationS3DISMean IoU65.4PointCNN
Semantic SegmentationS3DISMean IoU62.1SPGraph
Semantic SegmentationS3DISMean IoU62.1SPGraph
Semantic SegmentationS3DISMean IoU47.6PointNet
Semantic SegmentationS3DISMean IoU47.6PointNet
Semantic SegmentationSTPLS3DmIOU47.64Point transformer
Semantic SegmentationS3DISmIoU (6-Fold)73.5PointTransformer
Semantic SegmentationS3DISmIoU (Area-5)70.4PointTransformer
Semantic SegmentationShapeNet-PartClass Average IoU83.7PointTransformer
Semantic SegmentationShapeNet-PartInstance Average IoU86.6PointTransformer
Shape Representation Of 3D Point CloudsModelNet40Mean Accuracy90.6PointTransformer
Shape Representation Of 3D Point CloudsModelNet40Overall Accuracy93.7PointTransformer
3D Semantic SegmentationSTPLS3DmIOU47.64Point transformer
3D Semantic SegmentationS3DISmIoU (6-Fold)73.5PointTransformer
3D Semantic SegmentationS3DISmIoU (Area-5)70.4PointTransformer
3D Point Cloud ClassificationModelNet40Mean Accuracy90.6PointTransformer
3D Point Cloud ClassificationModelNet40Overall Accuracy93.7PointTransformer
Point Cloud SegmentationPointCloud-Cmean Corruption Error (mCE)1.049PointTransformers
10-shot image generationS3DIS Area5mAcc76.5PointTransformer
10-shot image generationS3DIS Area5mIoU70.4PointTransformer
10-shot image generationS3DIS Area5oAcc90.8PointTransformer
10-shot image generationS3DIS Area5mIoU57.3PointCNN
10-shot image generationS3DIS Area5mIoU41.1PointNet
10-shot image generationS3DISMean IoU73.5PointTransformer
10-shot image generationS3DISParams (M)7.8PointTransformer
10-shot image generationS3DISmAcc81.9PointTransformer
10-shot image generationS3DISoAcc90.2PointTransformer
10-shot image generationS3DISMean IoU70.6KPConv
10-shot image generationS3DISParams (M)14.1KPConv
10-shot image generationS3DISMean IoU70.6KPConv
10-shot image generationS3DISParams (M)14.1KPConv
10-shot image generationS3DISMean IoU65.4PointCNN
10-shot image generationS3DISMean IoU65.4PointCNN
10-shot image generationS3DISMean IoU62.1SPGraph
10-shot image generationS3DISMean IoU62.1SPGraph
10-shot image generationS3DISMean IoU47.6PointNet
10-shot image generationS3DISMean IoU47.6PointNet
10-shot image generationSTPLS3DmIOU47.64Point transformer
10-shot image generationS3DISmIoU (6-Fold)73.5PointTransformer
10-shot image generationS3DISmIoU (Area-5)70.4PointTransformer
10-shot image generationShapeNet-PartClass Average IoU83.7PointTransformer
10-shot image generationShapeNet-PartInstance Average IoU86.6PointTransformer
3D Point Cloud ReconstructionModelNet40Mean Accuracy90.6PointTransformer
3D Point Cloud ReconstructionModelNet40Overall Accuracy93.7PointTransformer

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Automatic Classification and Segmentation of Tunnel Cracks Based on Deep Learning and Visual Explanations2025-07-18Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy2025-07-17Federated Learning for Commercial Image Sources2025-07-17MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17