Bin Dong, Fangao Zeng, Tiancai Wang, Xiangyu Zhang, Yichen Wei
In this paper, we propose an end-to-end framework for instance segmentation. Based on the recently introduced DETR [1], our method, termed SOLQ, segments objects by learning unified queries. In SOLQ, each query represents one object and has multiple representations: class, location and mask. The object queries learned perform classification, box regression and mask encoding simultaneously in an unified vector form. During training phase, the mask vectors encoded are supervised by the compression coding of raw spatial masks. In inference time, mask vectors produced can be directly transformed to spatial masks by the inverse process of compression coding. Experimental results show that SOLQ can achieve state-of-the-art performance, surpassing most of existing approaches. Moreover, the joint learning of unified query representation can greatly improve the detection performance of DETR. We hope our SOLQ can serve as a strong baseline for the Transformer-based instance segmentation. Code is available at https://github.com/megvii-research/SOLQ.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Object Detection | COCO test-dev | AP50 | 74.6 | SOLQ (Swin-L, single scale) |
| Object Detection | COCO test-dev | AP75 | 60.5 | SOLQ (Swin-L, single scale) |
| Object Detection | COCO test-dev | APL | 70.6 | SOLQ (Swin-L, single scale) |
| Object Detection | COCO test-dev | APM | 60 | SOLQ (Swin-L, single scale) |
| Object Detection | COCO test-dev | APS | 37.6 | SOLQ (Swin-L, single scale) |
| Object Detection | COCO test-dev | box mAP | 56.5 | SOLQ (Swin-L, single scale) |
| Object Detection | COCO test-dev | box mAP | 48.7 | SOLQ (ResNet101, single scale) |
| Object Detection | COCO test-dev | box mAP | 47.8 | SOLQ (ResNet50, single scale) |
| Object Detection | COCO minival | AP50 | 74.9 | SOLQ (Swin-L, single scale) |
| Object Detection | COCO minival | AP75 | 61.3 | SOLQ (Swin-L, single scale) |
| Object Detection | COCO minival | APL | 71.9 | SOLQ (Swin-L, single scale) |
| 3D | COCO test-dev | AP50 | 74.6 | SOLQ (Swin-L, single scale) |
| 3D | COCO test-dev | AP75 | 60.5 | SOLQ (Swin-L, single scale) |
| 3D | COCO test-dev | APL | 70.6 | SOLQ (Swin-L, single scale) |
| 3D | COCO test-dev | APM | 60 | SOLQ (Swin-L, single scale) |
| 3D | COCO test-dev | APS | 37.6 | SOLQ (Swin-L, single scale) |
| 3D | COCO test-dev | box mAP | 56.5 | SOLQ (Swin-L, single scale) |
| 3D | COCO test-dev | box mAP | 48.7 | SOLQ (ResNet101, single scale) |
| 3D | COCO test-dev | box mAP | 47.8 | SOLQ (ResNet50, single scale) |
| 3D | COCO minival | AP50 | 74.9 | SOLQ (Swin-L, single scale) |
| 3D | COCO minival | AP75 | 61.3 | SOLQ (Swin-L, single scale) |
| 3D | COCO minival | APL | 71.9 | SOLQ (Swin-L, single scale) |
| Instance Segmentation | COCO test-dev | mask AP | 46.7 | SOLQ (Swin-L, single scale) |
| Instance Segmentation | COCO test-dev | mask AP | 40.9 | SOLQ (ResNet101, single scale) |
| Instance Segmentation | COCO test-dev | mask AP | 39.7 | SOLQ (ResNet50, single scale) |
| 2D Classification | COCO test-dev | AP50 | 74.6 | SOLQ (Swin-L, single scale) |
| 2D Classification | COCO test-dev | AP75 | 60.5 | SOLQ (Swin-L, single scale) |
| 2D Classification | COCO test-dev | APL | 70.6 | SOLQ (Swin-L, single scale) |
| 2D Classification | COCO test-dev | APM | 60 | SOLQ (Swin-L, single scale) |
| 2D Classification | COCO test-dev | APS | 37.6 | SOLQ (Swin-L, single scale) |
| 2D Classification | COCO test-dev | box mAP | 56.5 | SOLQ (Swin-L, single scale) |
| 2D Classification | COCO test-dev | box mAP | 48.7 | SOLQ (ResNet101, single scale) |
| 2D Classification | COCO test-dev | box mAP | 47.8 | SOLQ (ResNet50, single scale) |
| 2D Classification | COCO minival | AP50 | 74.9 | SOLQ (Swin-L, single scale) |
| 2D Classification | COCO minival | AP75 | 61.3 | SOLQ (Swin-L, single scale) |
| 2D Classification | COCO minival | APL | 71.9 | SOLQ (Swin-L, single scale) |
| 2D Object Detection | COCO test-dev | AP50 | 74.6 | SOLQ (Swin-L, single scale) |
| 2D Object Detection | COCO test-dev | AP75 | 60.5 | SOLQ (Swin-L, single scale) |
| 2D Object Detection | COCO test-dev | APL | 70.6 | SOLQ (Swin-L, single scale) |
| 2D Object Detection | COCO test-dev | APM | 60 | SOLQ (Swin-L, single scale) |
| 2D Object Detection | COCO test-dev | APS | 37.6 | SOLQ (Swin-L, single scale) |
| 2D Object Detection | COCO test-dev | box mAP | 56.5 | SOLQ (Swin-L, single scale) |
| 2D Object Detection | COCO test-dev | box mAP | 48.7 | SOLQ (ResNet101, single scale) |
| 2D Object Detection | COCO test-dev | box mAP | 47.8 | SOLQ (ResNet50, single scale) |
| 2D Object Detection | COCO minival | AP50 | 74.9 | SOLQ (Swin-L, single scale) |
| 2D Object Detection | COCO minival | AP75 | 61.3 | SOLQ (Swin-L, single scale) |
| 2D Object Detection | COCO minival | APL | 71.9 | SOLQ (Swin-L, single scale) |
| 16k | COCO test-dev | AP50 | 74.6 | SOLQ (Swin-L, single scale) |
| 16k | COCO test-dev | AP75 | 60.5 | SOLQ (Swin-L, single scale) |
| 16k | COCO test-dev | APL | 70.6 | SOLQ (Swin-L, single scale) |
| 16k | COCO test-dev | APM | 60 | SOLQ (Swin-L, single scale) |
| 16k | COCO test-dev | APS | 37.6 | SOLQ (Swin-L, single scale) |
| 16k | COCO test-dev | box mAP | 56.5 | SOLQ (Swin-L, single scale) |
| 16k | COCO test-dev | box mAP | 48.7 | SOLQ (ResNet101, single scale) |
| 16k | COCO test-dev | box mAP | 47.8 | SOLQ (ResNet50, single scale) |
| 16k | COCO minival | AP50 | 74.9 | SOLQ (Swin-L, single scale) |
| 16k | COCO minival | AP75 | 61.3 | SOLQ (Swin-L, single scale) |
| 16k | COCO minival | APL | 71.9 | SOLQ (Swin-L, single scale) |