Ahmed Abbas, Paul Swoboda
We propose a fully differentiable architecture for simultaneous semantic and instance segmentation (a.k.a. panoptic segmentation) consisting of a convolutional neural network and an asymmetric multiway cut problem solver. The latter solves a combinatorial optimization problem that elegantly incorporates semantic and boundary predictions to produce a panoptic labeling. Our formulation allows to directly maximize a smooth surrogate of the panoptic quality metric by backpropagating the gradient through the optimization problem. Experimental evaluation shows improvement by backpropagating through the optimization problem w.r.t. comparable approaches on Cityscapes and COCO datasets. Overall, our approach shows the utility of using combinatorial optimization in tandem with deep learning in a challenging large scale real-world problem and showcases benefits and insights into training such an architecture.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Semantic Segmentation | Cityscapes test | PQ | 60 | COPS (ResNet-50) |
| Semantic Segmentation | Cityscapes val | AP | 34.1 | COPS (ResNet-50) |
| Semantic Segmentation | Cityscapes val | PQ | 62.1 | COPS (ResNet-50) |
| Semantic Segmentation | Cityscapes val | PQst | 67.2 | COPS (ResNet-50) |
| Semantic Segmentation | Cityscapes val | PQth | 55.1 | COPS (ResNet-50) |
| Semantic Segmentation | Cityscapes val | mIoU | 79.3 | COPS (ResNet-50) |
| Semantic Segmentation | COCO test-dev | PQ | 38.5 | COPS (ResNet-50) |
| Semantic Segmentation | COCO test-dev | PQst | 34.8 | COPS (ResNet-50) |
| Semantic Segmentation | COCO test-dev | PQth | 41 | COPS (ResNet-50) |
| 10-shot image generation | Cityscapes test | PQ | 60 | COPS (ResNet-50) |
| 10-shot image generation | Cityscapes val | AP | 34.1 | COPS (ResNet-50) |
| 10-shot image generation | Cityscapes val | PQ | 62.1 | COPS (ResNet-50) |
| 10-shot image generation | Cityscapes val | PQst | 67.2 | COPS (ResNet-50) |
| 10-shot image generation | Cityscapes val | PQth | 55.1 | COPS (ResNet-50) |
| 10-shot image generation | Cityscapes val | mIoU | 79.3 | COPS (ResNet-50) |
| 10-shot image generation | COCO test-dev | PQ | 38.5 | COPS (ResNet-50) |
| 10-shot image generation | COCO test-dev | PQst | 34.8 | COPS (ResNet-50) |
| 10-shot image generation | COCO test-dev | PQth | 41 | COPS (ResNet-50) |
| Panoptic Segmentation | Cityscapes test | PQ | 60 | COPS (ResNet-50) |
| Panoptic Segmentation | Cityscapes val | AP | 34.1 | COPS (ResNet-50) |
| Panoptic Segmentation | Cityscapes val | PQ | 62.1 | COPS (ResNet-50) |
| Panoptic Segmentation | Cityscapes val | PQst | 67.2 | COPS (ResNet-50) |
| Panoptic Segmentation | Cityscapes val | PQth | 55.1 | COPS (ResNet-50) |
| Panoptic Segmentation | Cityscapes val | mIoU | 79.3 | COPS (ResNet-50) |
| Panoptic Segmentation | COCO test-dev | PQ | 38.5 | COPS (ResNet-50) |
| Panoptic Segmentation | COCO test-dev | PQst | 34.8 | COPS (ResNet-50) |
| Panoptic Segmentation | COCO test-dev | PQth | 41 | COPS (ResNet-50) |