Maxim Kolodiazhnyi, Anna Vorontsova, Anton Konushin, Danila Rukhovich
Semantic, instance, and panoptic segmentation of 3D point clouds have been addressed using task-specific models of distinct design. Thereby, the similarity of all segmentation tasks and the implicit relationship between them have not been utilized effectively. This paper presents a unified, simple, and effective model addressing all these tasks jointly. The model, named OneFormer3D, performs instance and semantic segmentation consistently, using a group of learnable kernels, where each kernel is responsible for generating a mask for either an instance or a semantic category. These kernels are trained with a transformer-based decoder with unified instance and semantic queries passed as an input. Such a design enables training a model end-to-end in a single run, so that it achieves top performance on all three segmentation tasks simultaneously. Specifically, our OneFormer3D ranks 1st and sets a new state-of-the-art (+2.1 mAP50) in the ScanNet test leaderboard. We also demonstrate the state-of-the-art results in semantic, instance, and panoptic segmentation of ScanNet (+21 PQ), ScanNet200 (+3.8 mAP50), and S3DIS (+0.8 mIoU) datasets.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Semantic Segmentation | ScanNet | val mIoU | 76.6 | OneFormer3D |
| Semantic Segmentation | ScanNet | PQ | 71.2 | OneFormer3D |
| Semantic Segmentation | ScanNet | PQ_st | 86.1 | OneFormer3D |
| Semantic Segmentation | ScanNet | PQ_th | 69.6 | OneFormer3D |
| Semantic Segmentation | ScanNetV2 | PQ | 71.2 | OneFormer3D |
| Semantic Segmentation | ScanNet200 | val mIoU | 30.1 | OneFormer3D |
| Semantic Segmentation | S3DIS | mIoU (6-Fold) | 75 | OneFormer3D |
| Semantic Segmentation | S3DIS | mIoU (Area-5) | 72.4 | OneFormer3D |
| Object Detection | ScanNetV2 | mAP@0.25 | 76.9 | OneFormer3D |
| Object Detection | ScanNetV2 | mAP@0.5 | 65.3 | OneFormer3D |
| 3D | ScanNetV2 | mAP@0.25 | 76.9 | OneFormer3D |
| 3D | ScanNetV2 | mAP@0.5 | 65.3 | OneFormer3D |
| Instance Segmentation | S3DIS | AP@50 | 75.8 | OneFormer3D |
| Instance Segmentation | S3DIS | mAP | 63 | OneFormer3D |
| Instance Segmentation | S3DIS | mPrec | 82.3 | OneFormer3D |
| Instance Segmentation | S3DIS | mRec | 74.1 | OneFormer3D |
| Instance Segmentation | ScanNet(v2) | mAP | 56.6 | OneFromer3D |
| Instance Segmentation | ScanNet(v2) | mAP @ 50 | 80.1 | OneFromer3D |
| Instance Segmentation | ScanNet(v2) | mAP@25 | 89.6 | OneFromer3D |
| 3D Semantic Segmentation | ScanNet200 | val mIoU | 30.1 | OneFormer3D |
| 3D Semantic Segmentation | S3DIS | mIoU (6-Fold) | 75 | OneFormer3D |
| 3D Semantic Segmentation | S3DIS | mIoU (Area-5) | 72.4 | OneFormer3D |
| 3D Object Detection | ScanNetV2 | mAP@0.25 | 76.9 | OneFormer3D |
| 3D Object Detection | ScanNetV2 | mAP@0.5 | 65.3 | OneFormer3D |
| 2D Classification | ScanNetV2 | mAP@0.25 | 76.9 | OneFormer3D |
| 2D Classification | ScanNetV2 | mAP@0.5 | 65.3 | OneFormer3D |
| 2D Object Detection | ScanNetV2 | mAP@0.25 | 76.9 | OneFormer3D |
| 2D Object Detection | ScanNetV2 | mAP@0.5 | 65.3 | OneFormer3D |
| 10-shot image generation | ScanNet | val mIoU | 76.6 | OneFormer3D |
| 10-shot image generation | ScanNet | PQ | 71.2 | OneFormer3D |
| 10-shot image generation | ScanNet | PQ_st | 86.1 | OneFormer3D |
| 10-shot image generation | ScanNet | PQ_th | 69.6 | OneFormer3D |
| 10-shot image generation | ScanNetV2 | PQ | 71.2 | OneFormer3D |
| 10-shot image generation | ScanNet200 | val mIoU | 30.1 | OneFormer3D |
| 10-shot image generation | S3DIS | mIoU (6-Fold) | 75 | OneFormer3D |
| 10-shot image generation | S3DIS | mIoU (Area-5) | 72.4 | OneFormer3D |
| Panoptic Segmentation | ScanNet | PQ | 71.2 | OneFormer3D |
| Panoptic Segmentation | ScanNet | PQ_st | 86.1 | OneFormer3D |
| Panoptic Segmentation | ScanNet | PQ_th | 69.6 | OneFormer3D |
| Panoptic Segmentation | ScanNetV2 | PQ | 71.2 | OneFormer3D |
| 16k | ScanNetV2 | mAP@0.25 | 76.9 | OneFormer3D |
| 16k | ScanNetV2 | mAP@0.5 | 65.3 | OneFormer3D |
| 3D Instance Segmentation | S3DIS | AP@50 | 75.8 | OneFormer3D |
| 3D Instance Segmentation | S3DIS | mAP | 63 | OneFormer3D |
| 3D Instance Segmentation | S3DIS | mPrec | 82.3 | OneFormer3D |
| 3D Instance Segmentation | S3DIS | mRec | 74.1 | OneFormer3D |
| 3D Instance Segmentation | ScanNet(v2) | mAP | 56.6 | OneFromer3D |
| 3D Instance Segmentation | ScanNet(v2) | mAP @ 50 | 80.1 | OneFromer3D |
| 3D Instance Segmentation | ScanNet(v2) | mAP@25 | 89.6 | OneFromer3D |