Daniel Seichter, Benedict Stephan, Söhnke Benedikt Fischedick, Steffen Müller, Leonard Rabes, Horst-Michael Gross
As the application scenarios of mobile robots are getting more complex and challenging, scene understanding becomes increasingly crucial. A mobile robot that is supposed to operate autonomously in indoor environments must have precise knowledge about what objects are present, where they are, what their spatial extent is, and how they can be reached; i.e., information about free space is also crucial. Panoptic mapping is a powerful instrument providing such information. However, building 3D panoptic maps with high spatial resolution is challenging on mobile robots, given their limited computing capabilities. In this paper, we propose PanopticNDT - an efficient and robust panoptic mapping approach based on occupancy normal distribution transform (NDT) mapping. We evaluate our approach on the publicly available datasets Hypersim and ScanNetV2. The results reveal that our approach can represent panoptic information at a higher level of detail than other state-of-the-art approaches while enabling real-time panoptic mapping on mobile robots. Finally, we prove the real-world applicability of PanopticNDT with qualitative results in a domestic application.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Semantic Segmentation | ScanNet | test mIoU | 68.1 | PanopticNDT (10cm) |
| Semantic Segmentation | ScanNet | val mIoU | 68.39 | PanopticNDT (10cm) |
| Semantic Segmentation | NYU Depth v2 | Mean IoU | 59.02 | EMSANet (2x ResNet-34 NBt1D, PanopticNDT version, finetuned) |
| Semantic Segmentation | Hypersim | mIoU | 49.74 | EMSANet (2x ResNet-34 NBt1D) |
| Semantic Segmentation | Hypersim | mIoU (test) | 46.66 | EMSANet (2x ResNet-34 NBt1D) |
| Semantic Segmentation | NYU Depth v2 | PQ | 51.15 | EMSANet (2x ResNet-34 NBt1D, PanopticNDT version, finetuned) |
| Semantic Segmentation | ScanNetV2 | PQ | 59.19 | PanopticNDT (10cm) |
| Semantic Segmentation | Hypersim | PQ | 34.95 | EMSANet (2x ResNet-34 NBt1D) |
| Semantic Segmentation | Hypersim | PQ (test) | 29.77 | EMSANet (2x ResNet-34 NBt1D) |
| Semantic Segmentation | Hypersim | mIoU | 49.12 | EMSANet (2x ResNet-34 NBt1D) |
| Semantic Segmentation | Hypersim | mIoU (test) | 44.66 | EMSANet (2x ResNet-34 NBt1D) |
| Semantic Segmentation | Hypersim | mIoU | 45.43 | PanopticNDT (10cm) |
| Semantic Segmentation | Hypersim | mIoU (test) | 45.34 | PanopticNDT (10cm) |
| Semantic Segmentation | Hypersim | mIoU | 44.31 | SemanticNDT (10cm) |
| Semantic Segmentation | Hypersim | mIoU (test) | 44.8 | SemanticNDT (10cm) |
| 3D Semantic Segmentation | Hypersim | mIoU | 45.43 | PanopticNDT (10cm) |
| 3D Semantic Segmentation | Hypersim | mIoU (test) | 45.34 | PanopticNDT (10cm) |
| 3D Semantic Segmentation | Hypersim | mIoU | 44.31 | SemanticNDT (10cm) |
| 3D Semantic Segmentation | Hypersim | mIoU (test) | 44.8 | SemanticNDT (10cm) |
| 10-shot image generation | ScanNet | test mIoU | 68.1 | PanopticNDT (10cm) |
| 10-shot image generation | ScanNet | val mIoU | 68.39 | PanopticNDT (10cm) |
| 10-shot image generation | NYU Depth v2 | Mean IoU | 59.02 | EMSANet (2x ResNet-34 NBt1D, PanopticNDT version, finetuned) |
| 10-shot image generation | Hypersim | mIoU | 49.74 | EMSANet (2x ResNet-34 NBt1D) |
| 10-shot image generation | Hypersim | mIoU (test) | 46.66 | EMSANet (2x ResNet-34 NBt1D) |
| 10-shot image generation | NYU Depth v2 | PQ | 51.15 | EMSANet (2x ResNet-34 NBt1D, PanopticNDT version, finetuned) |
| 10-shot image generation | ScanNetV2 | PQ | 59.19 | PanopticNDT (10cm) |
| 10-shot image generation | Hypersim | PQ | 34.95 | EMSANet (2x ResNet-34 NBt1D) |
| 10-shot image generation | Hypersim | PQ (test) | 29.77 | EMSANet (2x ResNet-34 NBt1D) |
| 10-shot image generation | Hypersim | mIoU | 49.12 | EMSANet (2x ResNet-34 NBt1D) |
| 10-shot image generation | Hypersim | mIoU (test) | 44.66 | EMSANet (2x ResNet-34 NBt1D) |
| 10-shot image generation | Hypersim | mIoU | 45.43 | PanopticNDT (10cm) |
| 10-shot image generation | Hypersim | mIoU (test) | 45.34 | PanopticNDT (10cm) |
| 10-shot image generation | Hypersim | mIoU | 44.31 | SemanticNDT (10cm) |
| 10-shot image generation | Hypersim | mIoU (test) | 44.8 | SemanticNDT (10cm) |
| Panoptic Segmentation | NYU Depth v2 | PQ | 51.15 | EMSANet (2x ResNet-34 NBt1D, PanopticNDT version, finetuned) |
| Panoptic Segmentation | ScanNetV2 | PQ | 59.19 | PanopticNDT (10cm) |
| Panoptic Segmentation | Hypersim | PQ | 34.95 | EMSANet (2x ResNet-34 NBt1D) |
| Panoptic Segmentation | Hypersim | PQ (test) | 29.77 | EMSANet (2x ResNet-34 NBt1D) |
| Panoptic Segmentation | Hypersim | mIoU | 49.12 | EMSANet (2x ResNet-34 NBt1D) |
| Panoptic Segmentation | Hypersim | mIoU (test) | 44.66 | EMSANet (2x ResNet-34 NBt1D) |
| 2D Panoptic Segmentation | ScanNetV2 | PQ | 58.22 | EMSANet (2x ResNet-34 NBt1D, PanopticNDT version) |