Yanwei Li, Hengshuang Zhao, Xiaojuan Qi, LiWei Wang, Zeming Li, Jian Sun, Jiaya Jia
In this paper, we present a conceptually simple, strong, and efficient framework for panoptic segmentation, called Panoptic FCN. Our approach aims to represent and predict foreground things and background stuff in a unified fully convolutional pipeline. In particular, Panoptic FCN encodes each object instance or stuff category into a specific kernel weight with the proposed kernel generator and produces the prediction by convolving the high-resolution feature directly. With this approach, instance-aware and semantically consistent properties for things and stuff can be respectively satisfied in a simple generate-kernel-then-segment workflow. Without extra boxes for localization or instance separation, the proposed approach outperforms previous box-based and -free models with high efficiency on COCO, Cityscapes, and Mapillary Vistas datasets with single scale input. Our code is made publicly available at https://github.com/Jia-Research-Lab/PanopticFCN.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Semantic Segmentation | Cityscapes val | PQ | 61.4 | Panoptic FCN* (ResNet-FPN) |
| Semantic Segmentation | Cityscapes val | PQth | 54.8 | Panoptic FCN* (ResNet-FPN) |
| Semantic Segmentation | Cityscapes val | PQst | 70.6 | Panoptic FCN* (Swin-L, Cityscapes-fine) |
| Semantic Segmentation | Cityscapes val | PQth | 59.5 | Panoptic FCN* (Swin-L, Cityscapes-fine) |
| Semantic Segmentation | Cityscapes val | PQst | 66.6 | Panoptic FCN* (ResNet-50-FPN) |
| Semantic Segmentation | Mapillary val | PQ | 45.7 | Panoptic FCN* (Swin-L, single-scale) |
| Semantic Segmentation | Mapillary val | PQst | 52.1 | Panoptic FCN* (Swin-L, single-scale) |
| Semantic Segmentation | Mapillary val | PQth | 40.8 | Panoptic FCN* (Swin-L, single-scale) |
| Semantic Segmentation | Mapillary val | PQ | 36.9 | Panoptic FCN* (ResNet-FPN) |
| Semantic Segmentation | Mapillary val | PQth | 32.9 | Panoptic FCN* (ResNet-FPN) |
| Semantic Segmentation | Mapillary val | PQst | 42.3 | Panoptic FCN* (ResNet-50-FPN) |
| Semantic Segmentation | COCO test-dev | PQ | 52.7 | Panoptic FCN* (Swin-L) |
| Semantic Segmentation | COCO test-dev | PQth | 59.4 | Panoptic FCN* (Swin-L) |
| Semantic Segmentation | COCO test-dev | PQ | 47.5 | Panoptic FCN*++ (DCN-101-FPN) |
| Semantic Segmentation | COCO test-dev | PQst | 38.2 | Panoptic FCN*++ (DCN-101-FPN) |
| Semantic Segmentation | COCO test-dev | PQth | 53.7 | Panoptic FCN*++ (DCN-101-FPN) |
| Semantic Segmentation | COCO minival | PQ | 44.3 | Panoptic FCN* (ResNet-50-FPN) |
| Semantic Segmentation | COCO minival | PQst | 35.6 | Panoptic FCN* (ResNet-50-FPN) |
| Semantic Segmentation | COCO minival | PQth | 50 | Panoptic FCN* (ResNet-50-FPN) |
| Semantic Segmentation | COCO minival | RQ | 53 | Panoptic FCN* (ResNet-50-FPN) |
| Semantic Segmentation | COCO minival | RQst | 43.5 | Panoptic FCN* (ResNet-50-FPN) |
| Semantic Segmentation | COCO minival | RQth | 59.3 | Panoptic FCN* (ResNet-50-FPN) |
| Semantic Segmentation | COCO minival | SQ | 80.7 | Panoptic FCN* (ResNet-50-FPN) |
| Semantic Segmentation | COCO minival | SQst | 76.7 | Panoptic FCN* (ResNet-50-FPN) |
| Semantic Segmentation | COCO minival | SQth | 83.4 | Panoptic FCN* (ResNet-50-FPN) |
| Semantic Segmentation | COCO minival | PQth | 58.5 | Panoptic FCN* (Swin-L, single-scale) |
| Semantic Segmentation | COCO minival | RQ | 61.6 | Panoptic FCN* (Swin-L, single-scale) |
| Semantic Segmentation | COCO minival | RQst | 51.1 | Panoptic FCN* (Swin-L, single-scale) |
| Semantic Segmentation | COCO minival | RQth | 68.6 | Panoptic FCN* (Swin-L, single-scale) |
| Semantic Segmentation | COCO minival | SQ | 83.2 | Panoptic FCN* (Swin-L, single-scale) |
| Semantic Segmentation | COCO minival | SQst | 81.1 | Panoptic FCN* (Swin-L, single-scale) |
| Semantic Segmentation | COCO minival | SQth | 84.6 | Panoptic FCN* (Swin-L, single-scale) |
| 10-shot image generation | Cityscapes val | PQ | 61.4 | Panoptic FCN* (ResNet-FPN) |
| 10-shot image generation | Cityscapes val | PQth | 54.8 | Panoptic FCN* (ResNet-FPN) |
| 10-shot image generation | Cityscapes val | PQst | 70.6 | Panoptic FCN* (Swin-L, Cityscapes-fine) |
| 10-shot image generation | Cityscapes val | PQth | 59.5 | Panoptic FCN* (Swin-L, Cityscapes-fine) |
| 10-shot image generation | Cityscapes val | PQst | 66.6 | Panoptic FCN* (ResNet-50-FPN) |
| 10-shot image generation | Mapillary val | PQ | 45.7 | Panoptic FCN* (Swin-L, single-scale) |
| 10-shot image generation | Mapillary val | PQst | 52.1 | Panoptic FCN* (Swin-L, single-scale) |
| 10-shot image generation | Mapillary val | PQth | 40.8 | Panoptic FCN* (Swin-L, single-scale) |
| 10-shot image generation | Mapillary val | PQ | 36.9 | Panoptic FCN* (ResNet-FPN) |
| 10-shot image generation | Mapillary val | PQth | 32.9 | Panoptic FCN* (ResNet-FPN) |
| 10-shot image generation | Mapillary val | PQst | 42.3 | Panoptic FCN* (ResNet-50-FPN) |
| 10-shot image generation | COCO test-dev | PQ | 52.7 | Panoptic FCN* (Swin-L) |
| 10-shot image generation | COCO test-dev | PQth | 59.4 | Panoptic FCN* (Swin-L) |
| 10-shot image generation | COCO test-dev | PQ | 47.5 | Panoptic FCN*++ (DCN-101-FPN) |
| 10-shot image generation | COCO test-dev | PQst | 38.2 | Panoptic FCN*++ (DCN-101-FPN) |
| 10-shot image generation | COCO test-dev | PQth | 53.7 | Panoptic FCN*++ (DCN-101-FPN) |
| 10-shot image generation | COCO minival | PQ | 44.3 | Panoptic FCN* (ResNet-50-FPN) |
| 10-shot image generation | COCO minival | PQst | 35.6 | Panoptic FCN* (ResNet-50-FPN) |
| 10-shot image generation | COCO minival | PQth | 50 | Panoptic FCN* (ResNet-50-FPN) |
| 10-shot image generation | COCO minival | RQ | 53 | Panoptic FCN* (ResNet-50-FPN) |
| 10-shot image generation | COCO minival | RQst | 43.5 | Panoptic FCN* (ResNet-50-FPN) |
| 10-shot image generation | COCO minival | RQth | 59.3 | Panoptic FCN* (ResNet-50-FPN) |
| 10-shot image generation | COCO minival | SQ | 80.7 | Panoptic FCN* (ResNet-50-FPN) |
| 10-shot image generation | COCO minival | SQst | 76.7 | Panoptic FCN* (ResNet-50-FPN) |
| 10-shot image generation | COCO minival | SQth | 83.4 | Panoptic FCN* (ResNet-50-FPN) |
| 10-shot image generation | COCO minival | PQth | 58.5 | Panoptic FCN* (Swin-L, single-scale) |
| 10-shot image generation | COCO minival | RQ | 61.6 | Panoptic FCN* (Swin-L, single-scale) |
| 10-shot image generation | COCO minival | RQst | 51.1 | Panoptic FCN* (Swin-L, single-scale) |
| 10-shot image generation | COCO minival | RQth | 68.6 | Panoptic FCN* (Swin-L, single-scale) |
| 10-shot image generation | COCO minival | SQ | 83.2 | Panoptic FCN* (Swin-L, single-scale) |
| 10-shot image generation | COCO minival | SQst | 81.1 | Panoptic FCN* (Swin-L, single-scale) |
| 10-shot image generation | COCO minival | SQth | 84.6 | Panoptic FCN* (Swin-L, single-scale) |
| Panoptic Segmentation | Cityscapes val | PQ | 61.4 | Panoptic FCN* (ResNet-FPN) |
| Panoptic Segmentation | Cityscapes val | PQth | 54.8 | Panoptic FCN* (ResNet-FPN) |
| Panoptic Segmentation | Cityscapes val | PQst | 70.6 | Panoptic FCN* (Swin-L, Cityscapes-fine) |
| Panoptic Segmentation | Cityscapes val | PQth | 59.5 | Panoptic FCN* (Swin-L, Cityscapes-fine) |
| Panoptic Segmentation | Cityscapes val | PQst | 66.6 | Panoptic FCN* (ResNet-50-FPN) |
| Panoptic Segmentation | Mapillary val | PQ | 45.7 | Panoptic FCN* (Swin-L, single-scale) |
| Panoptic Segmentation | Mapillary val | PQst | 52.1 | Panoptic FCN* (Swin-L, single-scale) |
| Panoptic Segmentation | Mapillary val | PQth | 40.8 | Panoptic FCN* (Swin-L, single-scale) |
| Panoptic Segmentation | Mapillary val | PQ | 36.9 | Panoptic FCN* (ResNet-FPN) |
| Panoptic Segmentation | Mapillary val | PQth | 32.9 | Panoptic FCN* (ResNet-FPN) |
| Panoptic Segmentation | Mapillary val | PQst | 42.3 | Panoptic FCN* (ResNet-50-FPN) |
| Panoptic Segmentation | COCO test-dev | PQ | 52.7 | Panoptic FCN* (Swin-L) |
| Panoptic Segmentation | COCO test-dev | PQth | 59.4 | Panoptic FCN* (Swin-L) |
| Panoptic Segmentation | COCO test-dev | PQ | 47.5 | Panoptic FCN*++ (DCN-101-FPN) |
| Panoptic Segmentation | COCO test-dev | PQst | 38.2 | Panoptic FCN*++ (DCN-101-FPN) |
| Panoptic Segmentation | COCO test-dev | PQth | 53.7 | Panoptic FCN*++ (DCN-101-FPN) |
| Panoptic Segmentation | COCO minival | PQ | 44.3 | Panoptic FCN* (ResNet-50-FPN) |
| Panoptic Segmentation | COCO minival | PQst | 35.6 | Panoptic FCN* (ResNet-50-FPN) |
| Panoptic Segmentation | COCO minival | PQth | 50 | Panoptic FCN* (ResNet-50-FPN) |
| Panoptic Segmentation | COCO minival | RQ | 53 | Panoptic FCN* (ResNet-50-FPN) |
| Panoptic Segmentation | COCO minival | RQst | 43.5 | Panoptic FCN* (ResNet-50-FPN) |
| Panoptic Segmentation | COCO minival | RQth | 59.3 | Panoptic FCN* (ResNet-50-FPN) |
| Panoptic Segmentation | COCO minival | SQ | 80.7 | Panoptic FCN* (ResNet-50-FPN) |
| Panoptic Segmentation | COCO minival | SQst | 76.7 | Panoptic FCN* (ResNet-50-FPN) |
| Panoptic Segmentation | COCO minival | SQth | 83.4 | Panoptic FCN* (ResNet-50-FPN) |
| Panoptic Segmentation | COCO minival | PQth | 58.5 | Panoptic FCN* (Swin-L, single-scale) |
| Panoptic Segmentation | COCO minival | RQ | 61.6 | Panoptic FCN* (Swin-L, single-scale) |
| Panoptic Segmentation | COCO minival | RQst | 51.1 | Panoptic FCN* (Swin-L, single-scale) |
| Panoptic Segmentation | COCO minival | RQth | 68.6 | Panoptic FCN* (Swin-L, single-scale) |
| Panoptic Segmentation | COCO minival | SQ | 83.2 | Panoptic FCN* (Swin-L, single-scale) |
| Panoptic Segmentation | COCO minival | SQst | 81.1 | Panoptic FCN* (Swin-L, single-scale) |
| Panoptic Segmentation | COCO minival | SQth | 84.6 | Panoptic FCN* (Swin-L, single-scale) |