Rohit Mohan, Abhinav Valada
Understanding the scene in which an autonomous robot operates is critical for its competent functioning. Such scene comprehension necessitates recognizing instances of traffic participants along with general scene semantics which can be effectively addressed by the panoptic segmentation task. In this paper, we introduce the Efficient Panoptic Segmentation (EfficientPS) architecture that consists of a shared backbone which efficiently encodes and fuses semantically rich multi-scale features. We incorporate a new semantic head that aggregates fine and contextual features coherently and a new variant of Mask R-CNN as the instance head. We also propose a novel panoptic fusion module that congruously integrates the output logits from both the heads of our EfficientPS architecture to yield the final panoptic segmentation output. Additionally, we introduce the KITTI panoptic segmentation dataset that contains panoptic annotations for the popularly challenging KITTI benchmark. Extensive evaluations on Cityscapes, KITTI, Mapillary Vistas and Indian Driving Dataset demonstrate that our proposed architecture consistently sets the new state-of-the-art on all these four benchmarks while being the most efficient and fast panoptic segmentation architecture to date.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Semantic Segmentation | Cityscapes test | PQ | 67.1 | EfficientPS |
| Semantic Segmentation | Cityscapes test | PQ | 62.9 | EfficientPS (Cityscapes-fine) |
| Semantic Segmentation | Cityscapes val | AP | 43.5 | EfficientPS |
| Semantic Segmentation | Cityscapes val | PQ | 67.5 | EfficientPS |
| Semantic Segmentation | Cityscapes val | PQst | 70.3 | EfficientPS |
| Semantic Segmentation | Cityscapes val | PQth | 63.2 | EfficientPS |
| Semantic Segmentation | Cityscapes val | mIoU | 82.1 | EfficientPS |
| Semantic Segmentation | Cityscapes val | AP | 39.1 | EfficientPS (Cityscapes-fine) |
| Semantic Segmentation | Cityscapes val | PQ | 64.9 | EfficientPS (Cityscapes-fine) |
| Semantic Segmentation | Cityscapes val | PQst | 67.7 | EfficientPS (Cityscapes-fine) |
| Semantic Segmentation | Cityscapes val | PQth | 61 | EfficientPS (Cityscapes-fine) |
| Semantic Segmentation | Cityscapes val | mIoU | 90.3 | EfficientPS (Cityscapes-fine) |
| Semantic Segmentation | Mapillary val | PQ | 40.6 | EfficientPS |
| Semantic Segmentation | KITTI Panoptic Segmentation | PQ | 43.7 | EfficientPS |
| Semantic Segmentation | Indian Driving Dataset | PQ | 51.1 | EfficientPS |
| 10-shot image generation | Cityscapes test | PQ | 67.1 | EfficientPS |
| 10-shot image generation | Cityscapes test | PQ | 62.9 | EfficientPS (Cityscapes-fine) |
| 10-shot image generation | Cityscapes val | AP | 43.5 | EfficientPS |
| 10-shot image generation | Cityscapes val | PQ | 67.5 | EfficientPS |
| 10-shot image generation | Cityscapes val | PQst | 70.3 | EfficientPS |
| 10-shot image generation | Cityscapes val | PQth | 63.2 | EfficientPS |
| 10-shot image generation | Cityscapes val | mIoU | 82.1 | EfficientPS |
| 10-shot image generation | Cityscapes val | AP | 39.1 | EfficientPS (Cityscapes-fine) |
| 10-shot image generation | Cityscapes val | PQ | 64.9 | EfficientPS (Cityscapes-fine) |
| 10-shot image generation | Cityscapes val | PQst | 67.7 | EfficientPS (Cityscapes-fine) |
| 10-shot image generation | Cityscapes val | PQth | 61 | EfficientPS (Cityscapes-fine) |
| 10-shot image generation | Cityscapes val | mIoU | 90.3 | EfficientPS (Cityscapes-fine) |
| 10-shot image generation | Mapillary val | PQ | 40.6 | EfficientPS |
| 10-shot image generation | KITTI Panoptic Segmentation | PQ | 43.7 | EfficientPS |
| 10-shot image generation | Indian Driving Dataset | PQ | 51.1 | EfficientPS |
| Panoptic Segmentation | Cityscapes test | PQ | 67.1 | EfficientPS |
| Panoptic Segmentation | Cityscapes test | PQ | 62.9 | EfficientPS (Cityscapes-fine) |
| Panoptic Segmentation | Cityscapes val | AP | 43.5 | EfficientPS |
| Panoptic Segmentation | Cityscapes val | PQ | 67.5 | EfficientPS |
| Panoptic Segmentation | Cityscapes val | PQst | 70.3 | EfficientPS |
| Panoptic Segmentation | Cityscapes val | PQth | 63.2 | EfficientPS |
| Panoptic Segmentation | Cityscapes val | mIoU | 82.1 | EfficientPS |
| Panoptic Segmentation | Cityscapes val | AP | 39.1 | EfficientPS (Cityscapes-fine) |
| Panoptic Segmentation | Cityscapes val | PQ | 64.9 | EfficientPS (Cityscapes-fine) |
| Panoptic Segmentation | Cityscapes val | PQst | 67.7 | EfficientPS (Cityscapes-fine) |
| Panoptic Segmentation | Cityscapes val | PQth | 61 | EfficientPS (Cityscapes-fine) |
| Panoptic Segmentation | Cityscapes val | mIoU | 90.3 | EfficientPS (Cityscapes-fine) |
| Panoptic Segmentation | Mapillary val | PQ | 40.6 | EfficientPS |
| Panoptic Segmentation | KITTI Panoptic Segmentation | PQ | 43.7 | EfficientPS |
| Panoptic Segmentation | Indian Driving Dataset | PQ | 51.1 | EfficientPS |