10-shot image generation on Cityscapes val

Metric: mIoU (higher is better)

LeaderboardDataset

Loading chart...

Results

Hide extra data

Sort:

#	Model↕	mIoU▼	Extra Data	Paper	Date↕	Code
1	EfficientPS (Cityscapes-fine)	90.3	No	EfficientPS: Efficient Panoptic Segmentation	2020-04-05	Code
2	ViT-P (InternImage-H)	87.4	Yes	The Missing Point in Vision Transformers for Uni...	2025-05-26	Code
3	SERNet-Former	87.35	No	SERNet-Former: Semantic Segmentation by Efficien...	2024-01-28	Code
4	MetaPrompt-SD	87.1	Yes	Harnessing Diffusion Models for Visual Perceptio...	2023-12-22	Code
5	InternImage-H	87	Yes	InternImage: Exploring Large-Scale Vision Founda...	2022-11-10	Code
6	HRNetV2-OCR+PSA	86.93	Yes	Polarized Self-Attention: Towards High-quality P...	2021-07-02	Code
7	InternImage-XL	86.4	Yes	InternImage: Exploring Large-Scale Vision Founda...	2022-11-10	Code
8	HRNet-OCR	86.3	Yes	Hierarchical Multi-Scale Attention for Semantic ...	2020-05-21	Code
9	Depth Anything	86.2	No	Depth Anything: Unleashing the Power of Large-Sc...	2024-01-19	Code
10	OneFormer (ConvNeXt-XL, Mapillary, multi-scale)	85.8	Yes	OneFormer: One Transformer to Rule Universal Ima...	2022-11-10	Code
11	ViT-Adapter-L	85.8	Yes	Vision Transformer Adapter for Dense Predictions	2022-05-17	Code
12	ViT-P (OneFormer, InternImage-H)	85.4	No	The Missing Point in Vision Transformers for Uni...	2025-05-26	Code
13	Panoptic-DeepLab (SWideRNet [1, 1, 4.5], Mapillary Vistas, multi-scale)	85.3	Yes	Scaling Wide Residual Networks for Panoptic Segm...	2020-11-23	-
14	SeMask (SeMask Swin-L Mask2Former)	84.98	No	SeMask: Semantically Masked Transformers for Sem...	2021-12-23	Code
15	Sequential Ensemble (MiT-B5 + HRNet)	84.8	No	Sequential Ensembling for Semantic Segmentation	2022-10-08	-
16	Soft Labells (HRnet)	84.8	No	Soft labelling for semantic segmentation: Bringi...	2023-02-27	Code
17	OneFormer (ConvNeXt-XL, multi-scale)	84.6	No	OneFormer: One Transformer to Rule Universal Ima...	2022-11-10	Code
18	OneFormer (ConvNeXt-L, single-scale, 512x1024, Mapillary Vistas-pretrained)	84.6	Yes	OneFormer: One Transformer to Rule Universal Ima...	2022-11-10	Code
19	Axial-DeepLab-XL (Mapillary Vistas, multi-scale)	84.6	Yes	Axial-DeepLab: Stand-Alone Axial-Attention for P...	2020-03-17	Code
20	Panoptic-DeepLab (SWideRNet [1, 1, 4.5], Mapillary Vistas, single-scale)	84.6	Yes	Scaling Wide Residual Networks for Panoptic Segm...	2020-11-23	-
21	DiNAT-L (Mask2Former)	84.5	No	Dilated Neighborhood Attention Transformer	2022-09-29	Code
22	OneFormer (Swin-L, multi-scale)	84.4	No	OneFormer: One Transformer to Rule Universal Ima...	2022-11-10	Code
23	VPNeXt	84.4	No	VPNeXt -- Rethinking Dense Decoding for Plain Vi...	2025-02-23	-
24	VOLO-D4 (MS, ImageNet1k pretrain)	84.3	No	VOLO: Vision Outlooker for Visual Recognition	2021-06-24	Code
25	Mask2Former (Swin-L)	84.3	No	Masked-attention Mask Transformer for Universal ...	2021-12-02	Code
26	EoMT (DINOv2-L, single-scale, 1024x1024)	84.2	No	Your ViT is Secretly an Image Segmentation Model	2025-03-24	Code
27	SegFormer (MiT-B5, Mapillary)	84	Yes	SegFormer: Simple and Efficient Design for Seman...	2021-05-31	Code
28	DDP (ConvNeXt-L, step-3)	83.9	No	DDP: Diffusion Model for Dense Visual Prediction	2023-03-30	Code
29	HRNetV2 + OCR + RMI (PaddleClas pretrained)	83.6	No	Segmentation Transformer: Object-Contextual Repr...	2019-09-24	Code
30	OneFormer (ConvNeXt-XL, single-scale)	83.6	No	OneFormer: One Transformer to Rule Universal Ima...	2022-11-10	Code
31	SynBoost	83.5	No	Pixel-wise Anomaly Detection in Complex Driving ...	2021-03-09	Code
32	kMaX-DeepLab (single-scale)	83.5	No	kMaX-DeepLab: k-means Mask Transformer	2022-07-08	Code
33	HRNetV2+OCR+CBL(ImageNet pretrained)	83.4	No	-	-	Code
34	DiNAT-L (Mask2Former)	83.4	No	Dilated Neighborhood Attention Transformer	2022-09-29	Code
35	EfficientViT-B3 (r1184x2368)	83.2	No	EfficientViT: Multi-Scale Linear Attention for H...	2022-05-29	Code
36	OneFormer (DiNAT-L, single-scale)	83.1	No	OneFormer: One Transformer to Rule Universal Ima...	2022-11-10	Code
37	OneFormer (ConvNeXt-L, single-scale)	83	No	OneFormer: One Transformer to Rule Universal Ima...	2022-11-10	Code
38	AFF-Base (single-scale, point-based Mask2Former)	83	No	AutoFocusFormer: Image Segmentation off the Grid	2023-04-24	Code
39	OneFormer (Swin-L, single-scale)	83	No	OneFormer: One Transformer to Rule Universal Ima...	2022-11-10	Code
40	Mask2Former (Swin-L)	82.9	No	Masked-attention Mask Transformer for Universal ...	2021-12-02	Code
41	FAN-L-Hybrid+STL	82.8	No	Fully Attentional Networks with Self-emerging To...	2024-01-08	Code
42	ResNeSt-200	82.7	No	ResNeSt: Split-Attention Networks	2020-04-19	Code
43	WaveMix	82.7	No	WaveMix: A Resource-efficient Neural Network for...	2022-05-28	Code
44	CMX (B4)	82.6	No	CMX: Cross-Modal Fusion for RGB-X Semantic Segme...	2022-03-09	Code
45	WaveMix-256/16 (Level-4)	82.6	No	WaveMix: A Resource-efficient Neural Network for...	2022-05-28	Code
46	FAN-L-Hybrid	82.3	No	Understanding The Robustness in Vision Transform...	2022-04-26	Code
47	AFF-Small (single-scale, point-based Mask2Former)	82.2	No	AutoFocusFormer: Image Segmentation off the Grid	2023-04-24	Code
48	SETR-PUP (80k, MS)	82.15	No	Rethinking Semantic Segmentation from a Sequence...	2020-12-31	Code
49	EfficientPS	82.1	Yes	EfficientPS: Efficient Panoptic Segmentation	2020-04-05	Code
50	DSNet-Base(single-scale)	82	No	DSNet: A Novel Way to Use Atrous Convolutions in...	2024-06-06	Code
51	CMX (B2)	81.6	No	CMX: Cross-Modal Fusion for RGB-X Semantic Segme...	2022-03-09	Code
52	Soft Labells (Deeplab)	81.5	No	-	-	-
53	Panoptic-DeepLab (X71)	81.5	Yes	Panoptic-DeepLab: A Simple, Strong, and Fast Bas...	2019-11-22	Code
54	CMT-DeepLab (MaX-S, single-scale, IN-1K)	81.4	No	CMT-DeepLab: Clustering Mask Transformers for Pa...	2022-06-17	Code
55	HRNetV2 (HRNetV2-W48)	81.1	No	Deep High-Resolution Representation Learning for...	2019-08-20	Code
56	DEPICT-SA (ViT-L multi-scale)	81	No	Rethinking Decoders for Transformer-based Semant...	2024-11-05	Code
57	OCR (ResNet-101-FCN)	80.6	No	Segmentation Transformer: Object-Contextual Repr...	2019-09-24	Code
58	DSNet(single-scale)	80.4	No	DSNet: A Novel Way to Use Atrous Convolutions in...	2024-06-06	Code
59	SeMask (SeMask Swin-L FPN)	80.39	Yes	SeMask: Semantically Masked Transformers for Sem...	2021-12-23	Code
60	SML	80.33	No	Standardized Max Logits: A Simple yet Effective ...	2021-07-23	Code
61	HRNetV2 (HRNetV2-W40)	80.2	No	Deep High-Resolution Representation Learning for...	2019-08-20	Code
62	Dynamically Instantiated Network (ResNet-101)	79.8	No	Weakly- and Semi-Supervised Panoptic Segmentation	2018-08-10	Code
63	PSPNet (Dilated-ResNet-101)	79.7	No	Pyramid Scene Parsing Network	2016-12-04	Code
64	DeepLabv3+ (Dilated-Xception-71)	79.6	No	Encoder-Decoder with Atrous Separable Convolutio...	2018-02-07	Code
65	DDRNet23	79.4	No	Deep Dual-resolution Networks for Real-time and ...	2021-01-15	Code
66	COPS (ResNet-50)	79.3	No	Combinatorial Optimization for Panoptic Segmenta...	2021-06-06	Code
67	AdaptIS (ResNeXt-101)	79.2	No	AdaptIS: Adaptive Instance Selection Network	2019-09-17	-
68	UPSNet (ResNet-101, multiscale)	79.2	Yes	UPSNet: A Unified Panoptic Segmentation Network	2019-01-12	Code
69	DEPICT-SA (ViT-L single-scale)	78.8	No	Rethinking Decoders for Transformer-based Semant...	2024-11-05	Code
70	SemanticFPN P2-P5 + PointRend	78.6	No	PointRend: Image Segmentation as Rendering	2019-12-17	Code
71	StreamDEQ (8 iterations)	78.2	No	Representation Recycling for Streaming Video Ana...	2022-04-28	Code
72	PP-LiteSeg-B2	78.2	No	PP-LiteSeg: A Superior Real-Time Semantic Segmen...	2022-04-06	Code
73	TASCNet (ResNet-50, multi-scale)	78	Yes	Learning to Fuse Things and Stuff	2018-12-04	-
74	HALO	77.8	No	Hyperbolic Active Learning for Semantic Segmenta...	2023-06-19	Code
75	UPSNet (ResNet-101)	77.8	Yes	UPSNet: A Unified Panoptic Segmentation Network	2019-01-12	Code
76	TASCNet (ResNet-50)	77.8	Yes	Learning to Fuse Things and Stuff	2018-12-04	-
77	DDRNet23-slim	77.4	No	Deep Dual-resolution Networks for Real-time and ...	2021-01-15	Code
78	AdaptIS (ResNet-101)	77.2	No	AdaptIS: Adaptive Instance Selection Network	2019-09-17	-
79	EEEA-Net-C2 (ours)	76.8	No	EEEA-Net: An Early Exit Evolutionary Neural Arch...	2021-08-13	Code
80	WaveMixLite-256/16	76.79	No	-	-	Code
81	SwinMTL	76.41	No	SwinMTL: A Shared Architecture for Simultaneous ...	2024-03-15	Code
82	CSFNet-2	76.36	No	CSFNet: A Cosine Similarity Fusion Network for R...	2024-07-01	Code
83	CSFNet-2	76.36	No	CSFNet: A Cosine Similarity Fusion Network for R...	2024-07-01	Code
84	RepMLPNet-D256	76.27	No	RepMLPNet: Hierarchical Vision MLP with Re-param...	2021-12-21	Code
85	PP-LiteSeg-T2	76	No	PP-LiteSeg: A Superior Real-Time Semantic Segmen...	2022-04-06	Code
86	Dilated-ResNet (Dilated-ResNet-101)	75.7	No	Deep Residual Learning for Image Recognition	2015-12-10	Code
87	Panoptic FPN (ResNet-101)	75.7	No	Panoptic Feature Pyramid Networks	2019-01-08	Code
88	AUNet (ResNet-101-FPN)	75.6	No	Attention-guided Unified Network for Panoptic Se...	2018-12-10	-
89	UNet++ (ResNet-101)	75.5	No	UNet++: A Nested U-Net Architecture for Medical ...	2018-07-18	Code
90	AdaptIS (ResNet-50)	75.3	No	AdaptIS: Adaptive Instance Selection Network	2019-09-17	-
91	PP-LiteSeg-B1	75.3	No	PP-LiteSeg: A Superior Real-Time Semantic Segmen...	2022-04-06	Code
92	ReLICv2	75.2	No	Pushing the limits of self-supervised ResNets: C...	2022-01-13	Code
93	UPSNet (ResNet-50)	75.2	No	UPSNet: A Unified Panoptic Segmentation Network	2019-01-12	Code
94	CSFNet-1	74.73	No	CSFNet: A Cosine Similarity Fusion Network for R...	2024-07-01	Code
95	CSFNet-1	74.73	No	CSFNet: A Cosine Similarity Fusion Network for R...	2024-07-01	Code
96	BYOL	74.6	Yes	Pushing the limits of self-supervised ResNets: C...	2022-01-13	Code
97	FasterSeg	73.1	No	FasterSeg: Searching for Faster Real-time Semant...	2019-12-23	Code
98	PP-LiteSeg-T1	73.1	No	PP-LiteSeg: A Superior Real-Time Semantic Segmen...	2022-04-06	Code
99	StreamDEQ (4 iterations)	71.5	No	Representation Recycling for Streaming Video Ana...	2022-04-28	Code
100	Fast-SCNN + Coarse + ImageNet	69.19	No	Fast-SCNN: Fast Semantic Segmentation Network	2019-02-12	Code
101	DiCENet	63.4	No	DiCENet: Dimension-wise Convolutions for Efficie...	2019-06-08	Code
102	DCT-EDANet	61.6	No	Exploring Semantic Segmentation on the DCT Repre...	2019-07-23	-
103	StreamDEQ (2 iterations)	57.9	No	Representation Recycling for Streaming Video Ana...	2022-04-28	Code
104	CARB	52.1	No	Weakly Supervised Semantic Segmentation for Driv...	2023-12-21	Code
105	CorrCLIP	51.1	No	CorrCLIP: Reconstructing Correlations in CLIP wi...	2024-11-15	Code
106	Trident	47.6	No	Harnessing Vision Foundation Models for High-Per...	2024-11-14	Code
107	StreamDEQ (1 iterations)	45.5	No	Representation Recycling for Streaming Video Ana...	2022-04-28	Code
108	MRFP+(Ours) Resnet50	42.4	No	MRFP: Learning Generalizable Semantic Segmentati...	2023-11-30	Code
109	ProxyCLIP	42	No	ProxyCLIP: Proxy Attention Improves CLIP for Ope...	2024-08-09	Code
110	COSMOS ViT-B/16	34.7	No	COSMOS: Cross-Modality Self-Distillation for Vis...	2024-12-02	Code
111	Resnet50	34.66	No	MRFP: Learning Generalizable Semantic Segmentati...	2023-11-30	Code
112	TTD (MaskCLIP)	32	No	TTD: Text-Tag Self-Distillation Enhancing Image-...	2024-03-30	Code
113	TagAlign	27.5	No	TagAlign: Improving Vision-Language Alignment wi...	2023-12-21	Code
114	TTD (TCL)	27	No	TTD: Text-Tag Self-Distillation Enhancing Image-...	2024-03-30	Code
115	ReCo+	24.2	No	ReCo: Retrieve and Co-segment for Zero-shot Tran...	2022-06-14	Code
116	TCL	24	No	Learning to Generate Text-grounded Mask for Open...	2022-12-01	Code
117	Segmenter ViT-S/16	21.8	No	Drive&Segment: Unsupervised Semantic Segmentatio...	2022-03-21	Code
118	ReCo	19.3	No	ReCo: Retrieve and Co-segment for Zero-shot Tran...	2022-06-14	Code
119	CLIPpy ViT-B	18.1	No	Perceptual Grouping in Contrastive Vision-Langua...	2022-10-18	Code
120	MaskCLIP	10	No	Extract Free Dense Labels from CLIP	2021-12-02	Code

#1EfficientPS (Cityscapes-fine)SOTA
90.3
mIoU· 2020-04-05
EfficientPS: Efficient Panoptic Segmentation Code
#2ViT-P (InternImage-H)
87.4
mIoU· Extra Data· 2025-05-26
The Missing Point in Vision Transformers for Universal Image Segmentation Code
#3SERNet-Former
87.35
mIoU· 2024-01-28
SERNet-Former: Semantic Segmentation by Efficient Residual Network with Attention-Boosting Gates and Attention-Fusion Networks Code
#4MetaPrompt-SD
87.1
mIoU· Extra Data· 2023-12-22
Harnessing Diffusion Models for Visual Perception with Meta Prompts Code
#5InternImage-H
87
mIoU· Extra Data· 2022-11-10
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions Code
#6HRNetV2-OCR+PSA
86.93
mIoU· Extra Data· 2021-07-02
Polarized Self-Attention: Towards High-quality Pixel-wise Regression Code
#7InternImage-XL
86.4
mIoU· Extra Data· 2022-11-10
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions Code
#8HRNet-OCR
86.3
mIoU· Extra Data· 2020-05-21
Hierarchical Multi-Scale Attention for Semantic Segmentation Code
#9Depth Anything
86.2
mIoU· 2024-01-19
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data Code
#10OneFormer (ConvNeXt-XL, Mapillary, multi-scale)
85.8
mIoU· Extra Data· 2022-11-10
OneFormer: One Transformer to Rule Universal Image Segmentation Code
#11ViT-Adapter-L
85.8
mIoU· Extra Data· 2022-05-17
Vision Transformer Adapter for Dense Predictions Code
#12ViT-P (OneFormer, InternImage-H)
85.4
mIoU· 2025-05-26
The Missing Point in Vision Transformers for Universal Image Segmentation Code
#13Panoptic-DeepLab (SWideRNet [1, 1, 4.5], Mapillary Vistas, multi-scale)
85.3
mIoU· Extra Data· 2020-11-23
Scaling Wide Residual Networks for Panoptic Segmentation
#14SeMask (SeMask Swin-L Mask2Former)
84.98
mIoU· 2021-12-23
SeMask: Semantically Masked Transformers for Semantic Segmentation Code
#15Sequential Ensemble (MiT-B5 + HRNet)
84.8
mIoU· 2022-10-08
Sequential Ensembling for Semantic Segmentation
#16Soft Labells (HRnet)
84.8
mIoU· 2023-02-27
Soft labelling for semantic segmentation: Bringing coherence to label down-sampling Code
#17OneFormer (ConvNeXt-XL, multi-scale)
84.6
mIoU· 2022-11-10
OneFormer: One Transformer to Rule Universal Image Segmentation Code
#18OneFormer (ConvNeXt-L, single-scale, 512x1024, Mapillary Vistas-pretrained)
84.6
mIoU· Extra Data· 2022-11-10
OneFormer: One Transformer to Rule Universal Image Segmentation Code
#19Axial-DeepLab-XL (Mapillary Vistas, multi-scale)SOTA
84.6
mIoU· Extra Data· 2020-03-17
Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation Code
#20Panoptic-DeepLab (SWideRNet [1, 1, 4.5], Mapillary Vistas, single-scale)
84.6
mIoU· Extra Data· 2020-11-23
Scaling Wide Residual Networks for Panoptic Segmentation
#21DiNAT-L (Mask2Former)
84.5
mIoU· 2022-09-29
Dilated Neighborhood Attention Transformer Code
#22OneFormer (Swin-L, multi-scale)
84.4
mIoU· 2022-11-10
OneFormer: One Transformer to Rule Universal Image Segmentation Code
#23VPNeXt
84.4
mIoU· 2025-02-23
VPNeXt -- Rethinking Dense Decoding for Plain Vision Transformer
#24VOLO-D4 (MS, ImageNet1k pretrain)
84.3
mIoU· 2021-06-24
VOLO: Vision Outlooker for Visual Recognition Code
#25Mask2Former (Swin-L)
84.3
mIoU· 2021-12-02
Masked-attention Mask Transformer for Universal Image Segmentation Code
#26EoMT (DINOv2-L, single-scale, 1024x1024)
84.2
mIoU· 2025-03-24
Your ViT is Secretly an Image Segmentation Model Code
#27SegFormer (MiT-B5, Mapillary)
84
mIoU· Extra Data· 2021-05-31
SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers Code
#28DDP (ConvNeXt-L, step-3)
83.9
mIoU· 2023-03-30
DDP: Diffusion Model for Dense Visual Prediction Code
#29HRNetV2 + OCR + RMI (PaddleClas pretrained)SOTA
83.6
mIoU· 2019-09-24
Segmentation Transformer: Object-Contextual Representations for Semantic Segmentation Code
#30OneFormer (ConvNeXt-XL, single-scale)
83.6
mIoU· 2022-11-10
OneFormer: One Transformer to Rule Universal Image Segmentation Code
#31SynBoost
83.5
mIoU· 2021-03-09
Pixel-wise Anomaly Detection in Complex Driving Scenes Code
#32kMaX-DeepLab (single-scale)
83.5
mIoU· 2022-07-08
kMaX-DeepLab: k-means Mask Transformer Code
#33HRNetV2+OCR+CBL(ImageNet pretrained)
83.4
mIoU
No paperCode
#34DiNAT-L (Mask2Former)
83.4
mIoU· 2022-09-29
Dilated Neighborhood Attention Transformer Code
#35EfficientViT-B3 (r1184x2368)
83.2
mIoU· 2022-05-29
EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction Code
#36OneFormer (DiNAT-L, single-scale)
83.1
mIoU· 2022-11-10
OneFormer: One Transformer to Rule Universal Image Segmentation Code
#37OneFormer (ConvNeXt-L, single-scale)
83
mIoU· 2022-11-10
OneFormer: One Transformer to Rule Universal Image Segmentation Code
#38AFF-Base (single-scale, point-based Mask2Former)
83
mIoU· 2023-04-24
AutoFocusFormer: Image Segmentation off the Grid Code
#39OneFormer (Swin-L, single-scale)
83
mIoU· 2022-11-10
OneFormer: One Transformer to Rule Universal Image Segmentation Code
#40Mask2Former (Swin-L)
82.9
mIoU· 2021-12-02
Masked-attention Mask Transformer for Universal Image Segmentation Code
#41FAN-L-Hybrid+STL
82.8
mIoU· 2024-01-08
Fully Attentional Networks with Self-emerging Token Labeling Code
#42ResNeSt-200
82.7
mIoU· 2020-04-19
ResNeSt: Split-Attention Networks Code
#43WaveMix
82.7
mIoU· 2022-05-28
WaveMix: A Resource-efficient Neural Network for Image Analysis Code
#44CMX (B4)
82.6
mIoU· 2022-03-09
CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation with Transformers Code
#45WaveMix-256/16 (Level-4)
82.6
mIoU· 2022-05-28
WaveMix: A Resource-efficient Neural Network for Image Analysis Code
#46FAN-L-Hybrid
82.3
mIoU· 2022-04-26
Understanding The Robustness in Vision Transformers Code
#47AFF-Small (single-scale, point-based Mask2Former)
82.2
mIoU· 2023-04-24
AutoFocusFormer: Image Segmentation off the Grid Code
#48SETR-PUP (80k, MS)
82.15
mIoU· 2020-12-31
Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers Code
#49EfficientPS
82.1
mIoU· Extra Data· 2020-04-05
EfficientPS: Efficient Panoptic Segmentation Code
#50DSNet-Base(single-scale)
82
mIoU· 2024-06-06
DSNet: A Novel Way to Use Atrous Convolutions in Semantic Segmentation Code
#51CMX (B2)
81.6
mIoU· 2022-03-09
CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation with Transformers Code
#52Soft Labells (Deeplab)
81.5
mIoU
No paper
#53Panoptic-DeepLab (X71)
81.5
mIoU· Extra Data· 2019-11-22
Panoptic-DeepLab: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation Code
#54CMT-DeepLab (MaX-S, single-scale, IN-1K)
81.4
mIoU· 2022-06-17
CMT-DeepLab: Clustering Mask Transformers for Panoptic Segmentation Code
#55HRNetV2 (HRNetV2-W48)SOTA
81.1
mIoU· 2019-08-20
Deep High-Resolution Representation Learning for Visual Recognition Code
#56DEPICT-SA (ViT-L multi-scale)
81
mIoU· 2024-11-05
Rethinking Decoders for Transformer-based Semantic Segmentation: A Compression Perspective Code
#57OCR (ResNet-101-FCN)
80.6
mIoU· 2019-09-24
Segmentation Transformer: Object-Contextual Representations for Semantic Segmentation Code
#58DSNet(single-scale)
80.4
mIoU· 2024-06-06
DSNet: A Novel Way to Use Atrous Convolutions in Semantic Segmentation Code
#59SeMask (SeMask Swin-L FPN)
80.39
mIoU· Extra Data· 2021-12-23
SeMask: Semantically Masked Transformers for Semantic Segmentation Code
#60SML
80.33
mIoU· 2021-07-23
Standardized Max Logits: A Simple yet Effective Approach for Identifying Unexpected Road Obstacles in Urban-Scene Segmentation Code
#61HRNetV2 (HRNetV2-W40)
80.2
mIoU· 2019-08-20
Deep High-Resolution Representation Learning for Visual Recognition Code
#62Dynamically Instantiated Network (ResNet-101)SOTA
79.8
mIoU· 2018-08-10
Weakly- and Semi-Supervised Panoptic Segmentation Code
#63PSPNet (Dilated-ResNet-101)SOTA
79.7
mIoU· 2016-12-04
Pyramid Scene Parsing Network Code
#64DeepLabv3+ (Dilated-Xception-71)
79.6
mIoU· 2018-02-07
Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation Code
#65DDRNet23
79.4
mIoU· 2021-01-15
Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes Code
#66COPS (ResNet-50)
79.3
mIoU· 2021-06-06
Combinatorial Optimization for Panoptic Segmentation: A Fully Differentiable Approach Code
#67AdaptIS (ResNeXt-101)
79.2
mIoU· 2019-09-17
AdaptIS: Adaptive Instance Selection Network
#68UPSNet (ResNet-101, multiscale)
79.2
mIoU· Extra Data· 2019-01-12
UPSNet: A Unified Panoptic Segmentation Network Code
#69DEPICT-SA (ViT-L single-scale)
78.8
mIoU· 2024-11-05
Rethinking Decoders for Transformer-based Semantic Segmentation: A Compression Perspective Code
#70SemanticFPN P2-P5 + PointRend
78.6
mIoU· 2019-12-17
PointRend: Image Segmentation as Rendering Code
#71StreamDEQ (8 iterations)
78.2
mIoU· 2022-04-28
Representation Recycling for Streaming Video Analysis Code
#72PP-LiteSeg-B2
78.2
mIoU· 2022-04-06
PP-LiteSeg: A Superior Real-Time Semantic Segmentation Model Code
#73TASCNet (ResNet-50, multi-scale)
78
mIoU· Extra Data· 2018-12-04
Learning to Fuse Things and Stuff
#74HALO
77.8
mIoU· 2023-06-19
Hyperbolic Active Learning for Semantic Segmentation under Domain Shift Code
#75UPSNet (ResNet-101)
77.8
mIoU· Extra Data· 2019-01-12
UPSNet: A Unified Panoptic Segmentation Network Code
#76TASCNet (ResNet-50)
77.8
mIoU· Extra Data· 2018-12-04
Learning to Fuse Things and Stuff
#77DDRNet23-slim
77.4
mIoU· 2021-01-15
Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes Code
#78AdaptIS (ResNet-101)
77.2
mIoU· 2019-09-17
AdaptIS: Adaptive Instance Selection Network
#79EEEA-Net-C2 (ours)
76.8
mIoU· 2021-08-13
EEEA-Net: An Early Exit Evolutionary Neural Architecture Search Code
#80WaveMixLite-256/16
76.79
mIoU
No paperCode
#81SwinMTL
76.41
mIoU· 2024-03-15
SwinMTL: A Shared Architecture for Simultaneous Depth Estimation and Semantic Segmentation from Monocular Camera Images Code
#82CSFNet-2
76.36
mIoU· 2024-07-01
CSFNet: A Cosine Similarity Fusion Network for Real-Time RGB-X Semantic Segmentation of Driving Scenes Code
#83CSFNet-2
76.36
mIoU· 2024-07-01
CSFNet: A Cosine Similarity Fusion Network for Real-Time RGB-X Semantic Segmentation of Driving Scenes Code
#84RepMLPNet-D256
76.27
mIoU· 2021-12-21
RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality Code
#85PP-LiteSeg-T2
76
mIoU· 2022-04-06
PP-LiteSeg: A Superior Real-Time Semantic Segmentation Model Code
#86Dilated-ResNet (Dilated-ResNet-101)SOTA
75.7
mIoU· 2015-12-10
Deep Residual Learning for Image Recognition Code
#87Panoptic FPN (ResNet-101)
75.7
mIoU· 2019-01-08
Panoptic Feature Pyramid Networks Code
#88AUNet (ResNet-101-FPN)
75.6
mIoU· 2018-12-10
Attention-guided Unified Network for Panoptic Segmentation
#89UNet++ (ResNet-101)
75.5
mIoU· 2018-07-18
UNet++: A Nested U-Net Architecture for Medical Image Segmentation Code
#90AdaptIS (ResNet-50)
75.3
mIoU· 2019-09-17
AdaptIS: Adaptive Instance Selection Network
#91PP-LiteSeg-B1
75.3
mIoU· 2022-04-06
PP-LiteSeg: A Superior Real-Time Semantic Segmentation Model Code
#92ReLICv2
75.2
mIoU· 2022-01-13
Pushing the limits of self-supervised ResNets: Can we outperform supervised learning without labels on ImageNet?Code
#93UPSNet (ResNet-50)
75.2
mIoU· 2019-01-12
UPSNet: A Unified Panoptic Segmentation Network Code
#94CSFNet-1
74.73
mIoU· 2024-07-01
CSFNet: A Cosine Similarity Fusion Network for Real-Time RGB-X Semantic Segmentation of Driving Scenes Code
#95CSFNet-1
74.73
mIoU· 2024-07-01
CSFNet: A Cosine Similarity Fusion Network for Real-Time RGB-X Semantic Segmentation of Driving Scenes Code
#96BYOL
74.6
mIoU· Extra Data· 2022-01-13
Pushing the limits of self-supervised ResNets: Can we outperform supervised learning without labels on ImageNet?Code
#97FasterSeg
73.1
mIoU· 2019-12-23
FasterSeg: Searching for Faster Real-time Semantic Segmentation Code
#98PP-LiteSeg-T1
73.1
mIoU· 2022-04-06
PP-LiteSeg: A Superior Real-Time Semantic Segmentation Model Code
#99StreamDEQ (4 iterations)
71.5
mIoU· 2022-04-28
Representation Recycling for Streaming Video Analysis Code
#100Fast-SCNN + Coarse + ImageNet
69.19
mIoU· 2019-02-12
Fast-SCNN: Fast Semantic Segmentation Network Code
#101DiCENet
63.4
mIoU· 2019-06-08
DiCENet: Dimension-wise Convolutions for Efficient Networks Code
#102DCT-EDANet
61.6
mIoU· 2019-07-23
Exploring Semantic Segmentation on the DCT Representation
#103StreamDEQ (2 iterations)
57.9
mIoU· 2022-04-28
Representation Recycling for Streaming Video Analysis Code
#104CARB
52.1
mIoU· 2023-12-21
Weakly Supervised Semantic Segmentation for Driving Scenes Code
#105CorrCLIP
51.1
mIoU· 2024-11-15
CorrCLIP: Reconstructing Correlations in CLIP with Off-the-Shelf Foundation Models for Open-Vocabulary Semantic Segmentation Code
#106Trident
47.6
mIoU· 2024-11-14
Harnessing Vision Foundation Models for High-Performance, Training-Free Open Vocabulary Segmentation Code
#107StreamDEQ (1 iterations)
45.5
mIoU· 2022-04-28
Representation Recycling for Streaming Video Analysis Code
#108MRFP+(Ours) Resnet50
42.4
mIoU· 2023-11-30
MRFP: Learning Generalizable Semantic Segmentation from Sim-2-Real with Multi-Resolution Feature Perturbation Code
#109ProxyCLIP
42
mIoU· 2024-08-09
ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation Code
#110COSMOS ViT-B/16
34.7
mIoU· 2024-12-02
COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training Code
#111Resnet50
34.66
mIoU· 2023-11-30
MRFP: Learning Generalizable Semantic Segmentation from Sim-2-Real with Multi-Resolution Feature Perturbation Code
#112TTD (MaskCLIP)
32
mIoU· 2024-03-30
TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias Code
#113TagAlign
27.5
mIoU· 2023-12-21
TagAlign: Improving Vision-Language Alignment with Multi-Tag Classification Code
#114TTD (TCL)
27
mIoU· 2024-03-30
TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias Code
#115ReCo+
24.2
mIoU· 2022-06-14
ReCo: Retrieve and Co-segment for Zero-shot Transfer Code
#116TCL
24
mIoU· 2022-12-01
Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs Code
#117Segmenter ViT-S/16
21.8
mIoU· 2022-03-21
Drive&Segment: Unsupervised Semantic Segmentation of Urban Scenes via Cross-modal Distillation Code
#118ReCo
19.3
mIoU· 2022-06-14
ReCo: Retrieve and Co-segment for Zero-shot Transfer Code
#119CLIPpy ViT-B
18.1
mIoU· 2022-10-18
Perceptual Grouping in Contrastive Vision-Language Models Code
#120MaskCLIP
10
mIoU· 2021-12-02
Extract Free Dense Labels from CLIP Code