Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Medical
/
Semantic Segmentation
/
NYU Depth v2
Semantic Segmentation on NYU Depth v2
Metric: Mean IoU (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
Sort:
Mean IoU (best first)
Mean IoU (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
Mean IoU
▼
Extra Data
Paper
Date
↕
Code
1
OmniVec2
63.6
Yes
-
-
-
2
DiffusionMMS (DAT++-S)
61.5
No
Diffusion-based RGB-D Semantic Segmentation with...
2024-09-23
-
3
DepthMatch (DINOv2-S)
61.4
Yes
DepthMatch: Semi-Supervised RGB-D Scene Parsing ...
2025-05-26
-
4
GeminiFusion (Swin-Large)
60.9
Yes
GeminiFusion: Efficient Pixel-wise Multimodal Fu...
2024-06-03
Code
5
OmniVec
60.8
Yes
OmniVec: Learning robust representations with cr...
2023-11-07
-
6
GeminiFusion (Swin-Large)
60.2
No
GeminiFusion: Efficient Pixel-wise Multimodal Fu...
2024-06-03
Code
7
DPLNet
59.3
No
Efficient Multimodal Semantic Segmentation via D...
2023-12-01
Code
8
EMSANet (2x ResNet-34 NBt1D, PanopticNDT version, finetuned)
59.02
No
PanopticNDT: Efficient and Robust Panoptic Mapping
2023-09-24
Code
9
GeminiFusion (MiT-B5)
57.7
No
GeminiFusion: Efficient Pixel-wise Multimodal Fu...
2024-06-03
Code
10
GeminiFusion (MiT-B3)
56.8
No
GeminiFusion: Efficient Pixel-wise Multimodal Fu...
2024-06-03
Code
11
HAPNet
55
No
HAPNet: Toward Superior RGB-Thermal Scene Parsin...
2024-04-04
Code
12
ICM
50.7
No
-
-
Code
13
ESANet (R34-NBt1D)
50.3
No
Efficient RGB-D Semantic Segmentation for Indoor...
2020-11-13
Code
14
MTI-Net (HRNet-48)
49
No
MTI-Net: Multi-Scale Task Interaction Networks f...
2020-01-19
Code
15
ESANet (R18-NBt1D )
48.17
No
Efficient RGB-D Semantic Segmentation for Indoor...
2020-11-13
Code
16
PGT (Swin-S)
46.43
No
Prompt Guided Transformer for Multi-Task Dense P...
2023-07-28
Code
17
VCD+DeepLab (VGG16)
45.3
No
-
-
-
18
TD2-PSP50
43.5
No
Temporally Distributed Networks for Fast Video S...
2020-04-03
Code
19
PGT (Swin-T)
41.61
No
Prompt Guided Transformer for Multi-Task Dense P...
2023-07-28
Code
20
TD4-PSP18
37.4
No
Temporally Distributed Networks for Fast Video S...
2020-04-03
Code
#1
OmniVec2
63.6
Mean IoU
· Extra Data
No paper
#2
DiffusionMMS (DAT++-S)
SOTA
61.5
Mean IoU
· 2024-09-23
Diffusion-based RGB-D Semantic Segmentation with Deformable Attention Transformer
#3
DepthMatch (DINOv2-S)
61.4
Mean IoU
· Extra Data
· 2025-05-26
DepthMatch: Semi-Supervised RGB-D Scene Parsing through Depth-Guided Regularization
#4
GeminiFusion (Swin-Large)
SOTA
60.9
Mean IoU
· Extra Data
· 2024-06-03
GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision Transformer
Code
#5
OmniVec
SOTA
60.8
Mean IoU
· Extra Data
· 2023-11-07
OmniVec: Learning robust representations with cross modal sharing
#6
GeminiFusion (Swin-Large)
60.2
Mean IoU
· 2024-06-03
GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision Transformer
Code
#7
DPLNet
59.3
Mean IoU
· 2023-12-01
Efficient Multimodal Semantic Segmentation via Dual-Prompt Learning
Code
#8
EMSANet (2x ResNet-34 NBt1D, PanopticNDT version, finetuned)
SOTA
59.02
Mean IoU
· 2023-09-24
PanopticNDT: Efficient and Robust Panoptic Mapping
Code
#9
GeminiFusion (MiT-B5)
57.7
Mean IoU
· 2024-06-03
GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision Transformer
Code
#10
GeminiFusion (MiT-B3)
56.8
Mean IoU
· 2024-06-03
GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision Transformer
Code
#11
HAPNet
55
Mean IoU
· 2024-04-04
HAPNet: Toward Superior RGB-Thermal Scene Parsing via Hybrid, Asymmetric, and Progressive Heterogeneous Feature Fusion
Code
#12
ICM
50.7
Mean IoU
No paper
Code
#13
ESANet (R34-NBt1D)
SOTA
50.3
Mean IoU
· 2020-11-13
Efficient RGB-D Semantic Segmentation for Indoor Scene Analysis
Code
#14
MTI-Net (HRNet-48)
SOTA
49
Mean IoU
· 2020-01-19
MTI-Net: Multi-Scale Task Interaction Networks for Multi-Task Learning
Code
#15
ESANet (R18-NBt1D )
48.17
Mean IoU
· 2020-11-13
Efficient RGB-D Semantic Segmentation for Indoor Scene Analysis
Code
#16
PGT (Swin-S)
46.43
Mean IoU
· 2023-07-28
Prompt Guided Transformer for Multi-Task Dense Prediction
Code
#17
VCD+DeepLab (VGG16)
45.3
Mean IoU
No paper
#18
TD2-PSP50
43.5
Mean IoU
· 2020-04-03
Temporally Distributed Networks for Fast Video Semantic Segmentation
Code
#19
PGT (Swin-T)
41.61
Mean IoU
· 2023-07-28
Prompt Guided Transformer for Multi-Task Dense Prediction
Code
#20
TD4-PSP18
37.4
Mean IoU
· 2020-04-03
Temporally Distributed Networks for Fast Video Semantic Segmentation
Code