Wenwei Zhang, Jiangmiao Pang, Kai Chen, Chen Change Loy
Semantic, instance, and panoptic segmentations have been addressed using different and specialized frameworks despite their underlying connections. This paper presents a unified, simple, and effective framework for these essentially similar tasks. The framework, named K-Net, segments both instances and semantic categories consistently by a group of learnable kernels, where each kernel is responsible for generating a mask for either a potential instance or a stuff class. To remedy the difficulties of distinguishing various instances, we propose a kernel update strategy that enables each kernel dynamic and conditional on its meaningful group in the input image. K-Net can be trained in an end-to-end manner with bipartite matching, and its training and inference are naturally NMS-free and box-free. Without bells and whistles, K-Net surpasses all previous published state-of-the-art single-model results of panoptic segmentation on MS COCO test-dev split and semantic segmentation on ADE20K val split with 55.2% PQ and 54.3% mIoU, respectively. Its instance segmentation performance is also on par with Cascade Mask R-CNN on MS COCO with 60%-90% faster inference speeds. Code and models will be released at https://github.com/ZwwWayne/K-Net/.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Semantic Segmentation | ADE20K val | mIoU | 54.3 | K-Net |
| Semantic Segmentation | ADE20K | Validation mIoU | 54.3 | K-Net |
| Semantic Segmentation | COCO test-dev | PQ | 55.2 | K-Net (Swin-L) |
| Semantic Segmentation | COCO test-dev | PQst | 46.2 | K-Net (Swin-L) |
| Semantic Segmentation | COCO test-dev | PQth | 61.2 | K-Net (Swin-L) |
| Semantic Segmentation | COCO test-dev | PQ | 48.3 | K-Net (R101-FPN-DCN) |
| Semantic Segmentation | COCO test-dev | PQst | 39.7 | K-Net (R101-FPN-DCN) |
| Semantic Segmentation | COCO test-dev | PQth | 54 | K-Net (R101-FPN-DCN) |
| Instance Segmentation | COCO test-dev | AP50 | 63.3 | K-Net-N256 (ResNet-101) |
| Instance Segmentation | COCO test-dev | APL | 59 | K-Net-N256 (ResNet-101) |
| Instance Segmentation | COCO test-dev | APM | 43.3 | K-Net-N256 (ResNet-101) |
| Instance Segmentation | COCO test-dev | APS | 18.8 | K-Net-N256 (ResNet-101) |
| Instance Segmentation | COCO test-dev | AP50 | 62.8 | K-Net (ResNet-101) |
| Instance Segmentation | COCO test-dev | APL | 58.8 | K-Net (ResNet-101) |
| Instance Segmentation | COCO test-dev | APM | 42.7 | K-Net (ResNet-101) |
| Instance Segmentation | COCO test-dev | APS | 18.7 | K-Net (ResNet-101) |
| 10-shot image generation | ADE20K val | mIoU | 54.3 | K-Net |
| 10-shot image generation | ADE20K | Validation mIoU | 54.3 | K-Net |
| 10-shot image generation | COCO test-dev | PQ | 55.2 | K-Net (Swin-L) |
| 10-shot image generation | COCO test-dev | PQst | 46.2 | K-Net (Swin-L) |
| 10-shot image generation | COCO test-dev | PQth | 61.2 | K-Net (Swin-L) |
| 10-shot image generation | COCO test-dev | PQ | 48.3 | K-Net (R101-FPN-DCN) |
| 10-shot image generation | COCO test-dev | PQst | 39.7 | K-Net (R101-FPN-DCN) |
| 10-shot image generation | COCO test-dev | PQth | 54 | K-Net (R101-FPN-DCN) |
| Panoptic Segmentation | COCO test-dev | PQ | 55.2 | K-Net (Swin-L) |
| Panoptic Segmentation | COCO test-dev | PQst | 46.2 | K-Net (Swin-L) |
| Panoptic Segmentation | COCO test-dev | PQth | 61.2 | K-Net (Swin-L) |
| Panoptic Segmentation | COCO test-dev | PQ | 48.3 | K-Net (R101-FPN-DCN) |
| Panoptic Segmentation | COCO test-dev | PQst | 39.7 | K-Net (R101-FPN-DCN) |
| Panoptic Segmentation | COCO test-dev | PQth | 54 | K-Net (R101-FPN-DCN) |