Metric: AP (higher is better)
| # | Model↕ | AP▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | OneFormer (InternImage-H, emb_dim=256, single-scale, 896x896) | 40.2 | No | OneFormer: One Transformer to Rule Universal Ima... | 2022-11-10 | Code |
| 2 | X-Decoder (Davit-d5, Deform, single-scale, 1280x1280) | 38.7 | Yes | Generalized Decoding for Pixel, Image, and Langu... | 2022-12-21 | Code |
| 3 | OneFormer (Swin-L, single-scale, 1280x1280) | 37.8 | No | OneFormer: One Transformer to Rule Universal Ima... | 2022-11-10 | Code |
| 4 | OneFormer (DiNAT-L, single-scale, 1280x1280) | 37.1 | No | OneFormer: One Transformer to Rule Universal Ima... | 2022-11-10 | Code |
| 5 | OneFormer (ConvNeXt-XL, single-scale, 640x640) | 36.3 | No | OneFormer: One Transformer to Rule Universal Ima... | 2022-11-10 | Code |
| 6 | OneFormer (ConvNeXt-L, single-scale, 640x640) | 36.2 | No | OneFormer: One Transformer to Rule Universal Ima... | 2022-11-10 | Code |
| 7 | OneFormer (DiNAT-L, single-scale, 640x640) | 36 | No | OneFormer: One Transformer to Rule Universal Ima... | 2022-11-10 | Code |
| 8 | OneFormer (Swin-L, single-scale, 640x640) | 35.9 | No | OneFormer: One Transformer to Rule Universal Ima... | 2022-11-10 | Code |
| 9 | X-Decoder (L) | 35.8 | Yes | Generalized Decoding for Pixel, Image, and Langu... | 2022-12-21 | Code |
| 10 | DiNAT-L (Mask2Former, 640x640) | 35 | No | Dilated Neighborhood Attention Transformer | 2022-09-29 | Code |
| 11 | Mask2Former (Swin-L) | 34.2 | No | Masked-attention Mask Transformer for Universal ... | 2021-12-02 | Code |
| 12 | Mask2Former (Swin-L + FAPN, 640x640) | 33.2 | No | Masked-attention Mask Transformer for Universal ... | 2021-12-02 | Code |
| 13 | Mask2Former (ResNet-50, 640x640) | 26.5 | No | Masked-attention Mask Transformer for Universal ... | 2021-12-02 | Code |