Metric: mIoU (higher is better)
| # | Model↕ | mIoU▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | OneFormer (InternImage-H, emb_dim=256, single-scale, 896x896) | 60.4 | No | OneFormer: One Transformer to Rule Universal Ima... | 2022-11-10 | Code |
| 2 | X-Decoder (Davit-d5, Deform, single-scale, 1280x1280) | 59.1 | Yes | Generalized Decoding for Pixel, Image, and Langu... | 2022-12-21 | Code |
| 3 | OneFormer (DiNAT-L, single-scale, 1280x1280, COCO-Pretrain) | 58.9 | Yes | OneFormer: One Transformer to Rule Universal Ima... | 2022-11-10 | Code |
| 4 | OneFormer (DiNAT-L, single-scale, 1280x1280) | 58.3 | No | OneFormer: One Transformer to Rule Universal Ima... | 2022-11-10 | Code |
| 5 | OneFormer (DiNAT-L, single-scale, 640x640) | 58.3 | No | OneFormer: One Transformer to Rule Universal Ima... | 2022-11-10 | Code |
| 6 | X-Decoder (L) | 58.1 | Yes | Generalized Decoding for Pixel, Image, and Langu... | 2022-12-21 | Code |
| 7 | OneFormer (ConvNeXt-XL, single-scale, 640x640) | 57.4 | No | OneFormer: One Transformer to Rule Universal Ima... | 2022-11-10 | Code |
| 8 | OneFormer (Swin-L, single-scale, 1280x1280) | 57 | No | OneFormer: One Transformer to Rule Universal Ima... | 2022-11-10 | Code |
| 9 | OneFormer (Swin-L, single-scale, 640x640) | 57 | No | OneFormer: One Transformer to Rule Universal Ima... | 2022-11-10 | Code |
| 10 | OneFormer (ConvNeXt-L, single-scale, 640x640) | 56.6 | No | OneFormer: One Transformer to Rule Universal Ima... | 2022-11-10 | Code |
| 11 | DiNAT-L (Mask2Former, 640x640) | 56.3 | No | Dilated Neighborhood Attention Transformer | 2022-09-29 | Code |
| 12 | Mask2Former (Swin-L + FAPN, 640x640) | 55.4 | No | Masked-attention Mask Transformer for Universal ... | 2021-12-02 | Code |
| 13 | kMaX-DeepLab (ConvNeXt-L, single-scale, 1281x1281) | 55.2 | No | kMaX-DeepLab: k-means Mask Transformer | 2022-07-08 | Code |
| 14 | kMaX-DeepLab (ConvNeXt-L, single-scale, 641x641) | 54.8 | No | kMaX-DeepLab: k-means Mask Transformer | 2022-07-08 | Code |
| 15 | Mask2Former (Swin-L) | 54.5 | No | Masked-attention Mask Transformer for Universal ... | 2021-12-02 | Code |
| 16 | Panoptic-DeepLab (SwideRNet) | 50 | No | Masked-attention Mask Transformer for Universal ... | 2021-12-02 | Code |
| 17 | Mask2Former (ResNet-50, 640x640) | 46.1 | No | Masked-attention Mask Transformer for Universal ... | 2021-12-02 | Code |
| 18 | kMaX-DeepLab (ResNet50, single-scale, 1281x1281) | 45.3 | No | kMaX-DeepLab: k-means Mask Transformer | 2022-07-08 | Code |
| 19 | kMaX-DeepLab (ResNet50, single-scale, 641x641) | 45 | No | kMaX-DeepLab: k-means Mask Transformer | 2022-07-08 | Code |