TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Computer Vision/Instance Segmentation/ADE20K val

Instance Segmentation on ADE20K val

Metric: AP (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕AP▼Extra DataPaperDate↕Code
1OneFormer (InternImage-H, emb_dim=1024, single-scale, 896x896, COCO-Pretrained)44.2NoOneFormer: One Transformer to Rule Universal Ima...2022-11-10Code
2OpenSeeD42.6YesA Simple Framework for Open-Vocabulary Segmentat...2023-03-14Code
3ViT-P (OneFormer, DiNAT-L, single-scale, 1280x1280, COCO_pretrain)40.7YesThe Missing Point in Vision Transformers for Uni...2025-05-26Code
4OneFormer (DiNAT-L, single-scale, 1280x1280, COCO-pretrain)40.2YesOneFormer: One Transformer to Rule Universal Ima...2022-11-10Code
5X-Decoder (Davit-d5, Deform, single-scale, 1280x1280)38.7YesGeneralized Decoding for Pixel, Image, and Langu...2022-12-21Code
6ViT-P (OneFormer, DiNAT-L, single-scale, 1280x1280)37.8NoThe Missing Point in Vision Transformers for Uni...2025-05-26Code
7OneFormer (DiNAT-L, single-scale)36NoOneFormer: One Transformer to Rule Universal Ima...2022-11-10Code
8OneFormer (Swin-L, single-scale)35.9NoOneFormer: One Transformer to Rule Universal Ima...2022-11-10Code
9X-Decoder (L)35.8YesGeneralized Decoding for Pixel, Image, and Langu...2022-12-21Code
10DiNAT-L (Mask2Former, single-scale)35.4NoDilated Neighborhood Attention Transformer2022-09-29Code
11Mask2Former (Swin-L, single-scale)34.9NoMasked-attention Mask Transformer for Universal ...2021-12-02Code
12Mask2Former (Swin-L + FAPN)33.4NoMasked-attention Mask Transformer for Universal ...2021-12-02Code
13Mask2Former (ResNet50)26.4NoMasked-attention Mask Transformer for Universal ...2021-12-02Code