Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Computer Vision
/
Video Instance Segmentation
/
YouTube-VIS validation
Video Instance Segmentation on YouTube-VIS validation
Metric: AP50 (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
Sort:
AP50 (best first)
AP50 (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
AP50
▼
Extra Data
Paper
Date
↕
Code
1
CAVIS(ViT-L, Online)
89.3
Yes
Context-Aware Video Instance Segmentation
2024-07-03
Code
2
DVIS++(ViT-L, Online)
88.8
Yes
DVIS++: Improved Decoupled Framework for Univers...
2023-12-20
Code
3
DVIS
88
Yes
DVIS: Decoupled Video Instance Segmentation Fram...
2023-06-06
Code
4
Tube-Link
86.6
No
Tube-Link: A Flexible Cross Tube Framework for U...
2023-03-22
Code
5
MDQE(Swin-L)
84.9
No
MDQE: Mining Discriminative Query Embeddings to ...
2023-03-25
Code
6
Mask2Former (Swin-L)
84.4
No
Mask2Former for Video Instance Segmentation
2021-12-20
Code
7
MinVIS (Swin-L)
83.3
No
MinVIS: A Minimal Video Instance Segmentation Fr...
2022-08-03
Code
8
UniVS(Swin-L)
82.1
Yes
UniVS: Unified and Universal Video Segmentation ...
2024-02-28
Code
9
SeqFormer (Swin-L)
82.1
Yes
SeqFormer: Sequential Transformer for Video Inst...
2021-12-15
Code
10
DeVIS (Swin-L)
80.8
No
DeVIS: Making Deformable Transformers Work for V...
2022-07-22
Code
11
Video K-Net (Swin-Base)
79
No
Video K-Net: A Simple, Strong, and Unified Basel...
2022-04-10
Code
12
InstanceFormer(Swin-L)
78
Yes
InstanceFormer: An Online Video Instance Segment...
2022-08-22
Code
13
TCIS (Swin-S)
76.6
No
1st Place Solution for YouTubeVOS Challenge 2021...
2021-06-12
-
14
NOVIS (ResNet-50)
75.7
Yes
NOVIS: A Case for End-to-End Near-Online Video I...
2023-08-29
-
15
IDOL (ResNet-50)
74
No
In Defense of Online Models for Video Instance S...
2022-07-21
Code
16
Mask2Former (ResNet-101)
72.8
No
Mask2Former for Video Instance Segmentation
2021-12-20
Code
17
SeqFormer (ResNet-101)
71.1
Yes
SeqFormer: Sequential Transformer for Video Inst...
2021-12-15
Code
18
SeqFormer (ResNet-50)
69.8
Yes
SeqFormer: Sequential Transformer for Video Inst...
2021-12-15
Code
19
MSN
69.4
No
MSN: Efficient Online Mask Selection Network for...
2021-06-19
Code
20
InstanceFormer(ResNet-50)
68.6
Yes
InstanceFormer: An Online Video Instance Segment...
2022-08-22
Code
21
Mask2Former (ResNet-50)
68
No
Mask2Former for Video Instance Segmentation
2021-12-20
Code
22
SeqFormer (ResNet-50)
66.9
No
SeqFormer: Sequential Transformer for Video Inst...
2021-12-15
Code
23
DeVIS (ResNet-50)
66.7
No
DeVIS: Making Deformable Transformers Work for V...
2022-07-22
Code
24
IFC (ResNet-50)
65.8
No
Video Instance Segmentation using Inter-Frame Co...
2021-06-07
Code
25
VisTR(ResNet-101)
64
No
End-to-End Video Instance Segmentation with Tran...
2020-11-30
Code
26
VisTR(ResNet-50)
59.8
No
End-to-End Video Instance Segmentation with Tran...
2020-11-30
Code
27
ObjProp (ResNet-50)
59.4
No
Object Propagation via Inter-Frame Attentions fo...
2021-11-15
Code
28
CrossVIS (ResNet-101)
57.3
No
Crossover Learning for Fast Online Video Instanc...
2021-04-13
Code
29
STC (ResNet-50)
57.2
No
STC: Spatio-Temporal Contrastive Learning for Vi...
2022-02-08
-
30
STMask(R101-DCN-FPN)
56.8
No
Spatial Feature Calibration and Temporal Fusion ...
2021-04-06
Code
31
CompFeat(ResNet-50)
56
No
CompFeat: Comprehensive Feature Aggregation for ...
2020-12-07
Code
32
STEm-Seg (ResNet-101)
55.8
No
STEm-Seg: Spatio-temporal Embeddings for Instanc...
2020-03-18
Code
33
CSipMask
55.6
No
Occluded Video Instance Segmentation: A Benchmark
2021-02-02
Code
34
PCAN(ResNet-50)
54.9
No
Prototypical Cross-Attention Networks for Multip...
2021-06-22
Code
35
SipMask (ResNet-50, ms-train, single-scale test)
54.1
No
SipMask: Spatial Information Preservation for Fa...
2020-07-29
Code
36
SipMask (ResNet-50, single-scale test)
53
No
SipMask: Spatial Information Preservation for Fa...
2020-07-29
Code
37
CMaskTrack R-CNN
52.8
No
Occluded Video Instance Segmentation: A Benchmark
2021-02-02
Code
38
TraDeS
52.6
No
Track to Detect and Segment: An Online Multi-Obj...
2021-03-16
Code
39
MaskTrack R-CNN (ResNet-50, single-scale training and test)
51.1
No
Video Instance Segmentation
2019-05-12
Code
40
STEm-Seg (ResNet-50)
50.7
No
STEm-Seg: Spatio-temporal Embeddings for Instanc...
2020-03-18
Code
41
DeepSORT
31.3
No
Simple Online and Realtime Tracking with a Deep ...
2017-03-21
Code
42
OSMN
28.6
No
Efficient Video Object Segmentation via Network ...
2018-02-04
Code
#1
CAVIS(ViT-L, Online)
SOTA
89.3
AP50
· Extra Data
· 2024-07-03
Context-Aware Video Instance Segmentation
Code
#2
DVIS++(ViT-L, Online)
SOTA
88.8
AP50
· Extra Data
· 2023-12-20
DVIS++: Improved Decoupled Framework for Universal Video Segmentation
Code
#3
DVIS
SOTA
88
AP50
· Extra Data
· 2023-06-06
DVIS: Decoupled Video Instance Segmentation Framework
Code
#4
Tube-Link
SOTA
86.6
AP50
· 2023-03-22
Tube-Link: A Flexible Cross Tube Framework for Universal Video Segmentation
Code
#5
MDQE(Swin-L)
84.9
AP50
· 2023-03-25
MDQE: Mining Discriminative Query Embeddings to Segment Occluded Instances on Challenging Videos
Code
#6
Mask2Former (Swin-L)
SOTA
84.4
AP50
· 2021-12-20
Mask2Former for Video Instance Segmentation
Code
#7
MinVIS (Swin-L)
83.3
AP50
· 2022-08-03
MinVIS: A Minimal Video Instance Segmentation Framework without Video-based Training
Code
#8
UniVS(Swin-L)
82.1
AP50
· Extra Data
· 2024-02-28
UniVS: Unified and Universal Video Segmentation with Prompts as Queries
Code
#9
SeqFormer (Swin-L)
SOTA
82.1
AP50
· Extra Data
· 2021-12-15
SeqFormer: Sequential Transformer for Video Instance Segmentation
Code
#10
DeVIS (Swin-L)
80.8
AP50
· 2022-07-22
DeVIS: Making Deformable Transformers Work for Video Instance Segmentation
Code
#11
Video K-Net (Swin-Base)
79
AP50
· 2022-04-10
Video K-Net: A Simple, Strong, and Unified Baseline for Video Segmentation
Code
#12
InstanceFormer(Swin-L)
78
AP50
· Extra Data
· 2022-08-22
InstanceFormer: An Online Video Instance Segmentation Framework
Code
#13
TCIS (Swin-S)
SOTA
76.6
AP50
· 2021-06-12
1st Place Solution for YouTubeVOS Challenge 2021:Video Instance Segmentation
#14
NOVIS (ResNet-50)
75.7
AP50
· Extra Data
· 2023-08-29
NOVIS: A Case for End-to-End Near-Online Video Instance Segmentation
#15
IDOL (ResNet-50)
74
AP50
· 2022-07-21
In Defense of Online Models for Video Instance Segmentation
Code
#16
Mask2Former (ResNet-101)
72.8
AP50
· 2021-12-20
Mask2Former for Video Instance Segmentation
Code
#17
SeqFormer (ResNet-101)
71.1
AP50
· Extra Data
· 2021-12-15
SeqFormer: Sequential Transformer for Video Instance Segmentation
Code
#18
SeqFormer (ResNet-50)
69.8
AP50
· Extra Data
· 2021-12-15
SeqFormer: Sequential Transformer for Video Instance Segmentation
Code
#19
MSN
69.4
AP50
· 2021-06-19
MSN: Efficient Online Mask Selection Network for Video Instance Segmentation
Code
#20
InstanceFormer(ResNet-50)
68.6
AP50
· Extra Data
· 2022-08-22
InstanceFormer: An Online Video Instance Segmentation Framework
Code
#21
Mask2Former (ResNet-50)
68
AP50
· 2021-12-20
Mask2Former for Video Instance Segmentation
Code
#22
SeqFormer (ResNet-50)
66.9
AP50
· 2021-12-15
SeqFormer: Sequential Transformer for Video Instance Segmentation
Code
#23
DeVIS (ResNet-50)
66.7
AP50
· 2022-07-22
DeVIS: Making Deformable Transformers Work for Video Instance Segmentation
Code
#24
IFC (ResNet-50)
SOTA
65.8
AP50
· 2021-06-07
Video Instance Segmentation using Inter-Frame Communication Transformers
Code
#25
VisTR(ResNet-101)
SOTA
64
AP50
· 2020-11-30
End-to-End Video Instance Segmentation with Transformers
Code
#26
VisTR(ResNet-50)
59.8
AP50
· 2020-11-30
End-to-End Video Instance Segmentation with Transformers
Code
#27
ObjProp (ResNet-50)
59.4
AP50
· 2021-11-15
Object Propagation via Inter-Frame Attentions for Temporally Stable Video Instance Segmentation
Code
#28
CrossVIS (ResNet-101)
57.3
AP50
· 2021-04-13
Crossover Learning for Fast Online Video Instance Segmentation
Code
#29
STC (ResNet-50)
57.2
AP50
· 2022-02-08
STC: Spatio-Temporal Contrastive Learning for Video Instance Segmentation
#30
STMask(R101-DCN-FPN)
56.8
AP50
· 2021-04-06
Spatial Feature Calibration and Temporal Fusion for Effective One-stage Video Instance Segmentation
Code
#31
CompFeat(ResNet-50)
56
AP50
· 2020-12-07
CompFeat: Comprehensive Feature Aggregation for Video Instance Segmentation
Code
#32
STEm-Seg (ResNet-101)
SOTA
55.8
AP50
· 2020-03-18
STEm-Seg: Spatio-temporal Embeddings for Instance Segmentation in Videos
Code
#33
CSipMask
55.6
AP50
· 2021-02-02
Occluded Video Instance Segmentation: A Benchmark
Code
#34
PCAN(ResNet-50)
54.9
AP50
· 2021-06-22
Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation
Code
#35
SipMask (ResNet-50, ms-train, single-scale test)
54.1
AP50
· 2020-07-29
SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation
Code
#36
SipMask (ResNet-50, single-scale test)
53
AP50
· 2020-07-29
SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation
Code
#37
CMaskTrack R-CNN
52.8
AP50
· 2021-02-02
Occluded Video Instance Segmentation: A Benchmark
Code
#38
TraDeS
52.6
AP50
· 2021-03-16
Track to Detect and Segment: An Online Multi-Object Tracker
Code
#39
MaskTrack R-CNN (ResNet-50, single-scale training and test)
SOTA
51.1
AP50
· 2019-05-12
Video Instance Segmentation
Code
#40
STEm-Seg (ResNet-50)
50.7
AP50
· 2020-03-18
STEm-Seg: Spatio-temporal Embeddings for Instance Segmentation in Videos
Code
#41
DeepSORT
SOTA
31.3
AP50
· 2017-03-21
Simple Online and Realtime Tracking with a Deep Association Metric
Code
#42
OSMN
28.6
AP50
· 2018-02-04
Efficient Video Object Segmentation via Network Modulation
Code