ViT-P (OneFormer, DiNAT-L, single-scale, 1280x1280, COCO_pretrain)

Reported on 4 benchmarks across 4 tasks · 1 paper

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Computer Vision2 results

Instance SegmentationonADE20K val
AP· uses extra data· 2025-05-26
40.7
best: 44.2 (OneFormer (InternImage-H, emb_dim=1024, single-scale, 896x896, COCO-Pretrained))
The Missing Point in Vision Transformers for Universal Image Segmentation arXiv:2505.19795
Panoptic SegmentationonADE20K val
PQ· uses extra data· 2025-05-26
54
best: 54.5 (OneFormer (InternImage-H, emb_dim=256, single-scale, 896x896))
The Missing Point in Vision Transformers for Universal Image Segmentation arXiv:2505.19795