Metric: APS (higher is better)
| # | Model↕ | APS▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | OneFormer (InternImage-H, emb_dim=1024, single-scale, 896x896, COCO-Pretrained) | 23.7 | No | OneFormer: One Transformer to Rule Universal Ima... | 2022-11-10 | Code |
| 2 | OneFormer (DiNAT-L, single-scale, 1280x1280, COCO-pretrain) | 19.2 | Yes | OneFormer: One Transformer to Rule Universal Ima... | 2022-11-10 | Code |
| 3 | X-Decoder (Davit-d5, Deform, single-scale, 1280x1280) | 18.9 | Yes | Generalized Decoding for Pixel, Image, and Langu... | 2022-12-21 | Code |
| 4 | DiNAT-L (Mask2Former, single-scale) | 16.3 | No | Dilated Neighborhood Attention Transformer | 2022-09-29 | Code |
| 5 | Mask2Former (Swin-L, single-scale) | 16.3 | No | Masked-attention Mask Transformer for Universal ... | 2021-12-02 | Code |
| 6 | Mask2Former (Swin-L + FAPN) | 14.6 | No | Masked-attention Mask Transformer for Universal ... | 2021-12-02 | Code |
| 7 | Mask2Former (ResNet50) | 10.4 | No | Masked-attention Mask Transformer for Universal ... | 2021-12-02 | Code |