Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Computer Vision
/
Video Instance Segmentation
/
YouTube-VIS 2021
Video Instance Segmentation on YouTube-VIS 2021
Metric: AP50 (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
Sort:
AP50 (best first)
AP50 (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
AP50
▼
Extra Data
Paper
Date
↕
Code
1
CAVIS(VIT-L, Offline)
87.3
Yes
Context-Aware Video Instance Segmentation
2024-07-03
Code
2
DVIS++(VIT-L, Offline)
86.7
Yes
DVIS++: Improved Decoupled Framework for Univers...
2023-12-20
Code
3
DVIS-DAQ(VIT-L, Offline)
86.1
Yes
DVIS-DAQ: Improving Video Segmentation via Dynam...
2024-03-29
Code
4
RefineVIS (Swin-L, online)
84.1
Yes
RefineVIS: Video Instance Segmentation with Temp...
2023-06-07
-
5
DVIS(Swin-L)
83
Yes
DVIS: Decoupled Video Instance Segmentation Fram...
2023-06-06
Code
6
DVIS++(VIT-L, Online)
82.7
Yes
DVIS++: Improved Decoupled Framework for Univers...
2023-12-20
Code
7
NOVIS (Swin-L)
82
Yes
NOVIS: A Case for End-to-End Near-Online Video I...
2023-08-29
-
8
TarViS (Swin-L)
81.4
Yes
TarViS: A Unified Approach for Target-based Vide...
2023-01-06
Code
9
GRAtt-VIS (Swin-L)
81.3
Yes
GRAtt-VIS: Gated Residual Attention for Auto Rec...
2023-05-26
Code
10
GenVIS (Swin-L)
80.9
Yes
A Generalized Framework for Video Instance Segme...
2022-11-16
Code
11
IDOL (Swin-L)
80.8
No
In Defense of Online Models for Video Instance S...
2022-07-21
Code
12
MDQE(Swin-L)
80.7
No
MDQE: Mining Discriminative Query Embeddings to ...
2023-03-25
Code
13
VITA (Swin-L)
80.6
Yes
VITA: Video Instance Segmentation via Object Tok...
2022-06-09
Code
14
Tube-Link(Swin-L)
79.4
No
Tube-Link: A Flexible Cross Tube Framework for U...
2023-03-22
Code
15
UniVS(Swin-L)
79.4
Yes
UniVS: Unified and Universal Video Segmentation ...
2024-02-28
Code
16
DeVIS (Swin-L)
77.7
No
DeVIS: Making Deformable Transformers Work for V...
2022-07-22
Code
17
MinVIS (Swin-L)
76.6
No
MinVIS: A Minimal Video Instance Segmentation Fr...
2022-08-03
Code
18
BoxVIS(Swin-L & Box-sup)
76.4
No
BoxVIS: Video Instance Segmentation with Box Ann...
2023-03-26
Code
19
InstanceFormer (Swin-L)
73.7
Yes
InstanceFormer: An Online Video Instance Segment...
2022-08-22
Code
20
TarViS (Swin-T)
71.6
Yes
TarViS: A Unified Approach for Target-based Vide...
2023-01-06
Code
21
TarViS (ResNet-50)
69.6
Yes
TarViS: A Unified Approach for Target-based Vide...
2023-01-06
Code
22
NOVIS (ResNet-50)
69.4
Yes
NOVIS: A Case for End-to-End Near-Online Video I...
2023-08-29
-
23
GRAtt-VIS (ResNet-50)
69.2
Yes
GRAtt-VIS: Gated Residual Attention for Auto Rec...
2023-05-26
Code
24
DeVIS (ResNet-50)
66.8
No
DeVIS: Making Deformable Transformers Work for V...
2022-07-22
Code
25
InstanceFormer (ResNet-50)
62.4
Yes
InstanceFormer: An Online Video Instance Segment...
2022-08-22
Code
26
STMask(R101-DCN-FPN)
54
No
Spatial Feature Calibration and Temporal Fusion ...
2021-04-06
Code
#1
CAVIS(VIT-L, Offline)
SOTA
87.3
AP50
· Extra Data
· 2024-07-03
Context-Aware Video Instance Segmentation
Code
#2
DVIS++(VIT-L, Offline)
SOTA
86.7
AP50
· Extra Data
· 2023-12-20
DVIS++: Improved Decoupled Framework for Universal Video Segmentation
Code
#3
DVIS-DAQ(VIT-L, Offline)
86.1
AP50
· Extra Data
· 2024-03-29
DVIS-DAQ: Improving Video Segmentation via Dynamic Anchor Queries
Code
#4
RefineVIS (Swin-L, online)
SOTA
84.1
AP50
· Extra Data
· 2023-06-07
RefineVIS: Video Instance Segmentation with Temporal Attention Refinement
#5
DVIS(Swin-L)
SOTA
83
AP50
· Extra Data
· 2023-06-06
DVIS: Decoupled Video Instance Segmentation Framework
Code
#6
DVIS++(VIT-L, Online)
82.7
AP50
· Extra Data
· 2023-12-20
DVIS++: Improved Decoupled Framework for Universal Video Segmentation
Code
#7
NOVIS (Swin-L)
82
AP50
· Extra Data
· 2023-08-29
NOVIS: A Case for End-to-End Near-Online Video Instance Segmentation
#8
TarViS (Swin-L)
SOTA
81.4
AP50
· Extra Data
· 2023-01-06
TarViS: A Unified Approach for Target-based Video Segmentation
Code
#9
GRAtt-VIS (Swin-L)
81.3
AP50
· Extra Data
· 2023-05-26
GRAtt-VIS: Gated Residual Attention for Auto Rectifying Video Instance Segmentation
Code
#10
GenVIS (Swin-L)
SOTA
80.9
AP50
· Extra Data
· 2022-11-16
A Generalized Framework for Video Instance Segmentation
Code
#11
IDOL (Swin-L)
SOTA
80.8
AP50
· 2022-07-21
In Defense of Online Models for Video Instance Segmentation
Code
#12
MDQE(Swin-L)
80.7
AP50
· 2023-03-25
MDQE: Mining Discriminative Query Embeddings to Segment Occluded Instances on Challenging Videos
Code
#13
VITA (Swin-L)
SOTA
80.6
AP50
· Extra Data
· 2022-06-09
VITA: Video Instance Segmentation via Object Token Association
Code
#14
Tube-Link(Swin-L)
79.4
AP50
· 2023-03-22
Tube-Link: A Flexible Cross Tube Framework for Universal Video Segmentation
Code
#15
UniVS(Swin-L)
79.4
AP50
· Extra Data
· 2024-02-28
UniVS: Unified and Universal Video Segmentation with Prompts as Queries
Code
#16
DeVIS (Swin-L)
77.7
AP50
· 2022-07-22
DeVIS: Making Deformable Transformers Work for Video Instance Segmentation
Code
#17
MinVIS (Swin-L)
76.6
AP50
· 2022-08-03
MinVIS: A Minimal Video Instance Segmentation Framework without Video-based Training
Code
#18
BoxVIS(Swin-L & Box-sup)
76.4
AP50
· 2023-03-26
BoxVIS: Video Instance Segmentation with Box Annotations
Code
#19
InstanceFormer (Swin-L)
73.7
AP50
· Extra Data
· 2022-08-22
InstanceFormer: An Online Video Instance Segmentation Framework
Code
#20
TarViS (Swin-T)
71.6
AP50
· Extra Data
· 2023-01-06
TarViS: A Unified Approach for Target-based Video Segmentation
Code
#21
TarViS (ResNet-50)
69.6
AP50
· Extra Data
· 2023-01-06
TarViS: A Unified Approach for Target-based Video Segmentation
Code
#22
NOVIS (ResNet-50)
69.4
AP50
· Extra Data
· 2023-08-29
NOVIS: A Case for End-to-End Near-Online Video Instance Segmentation
#23
GRAtt-VIS (ResNet-50)
69.2
AP50
· Extra Data
· 2023-05-26
GRAtt-VIS: Gated Residual Attention for Auto Rectifying Video Instance Segmentation
Code
#24
DeVIS (ResNet-50)
66.8
AP50
· 2022-07-22
DeVIS: Making Deformable Transformers Work for Video Instance Segmentation
Code
#25
InstanceFormer (ResNet-50)
62.4
AP50
· Extra Data
· 2022-08-22
InstanceFormer: An Online Video Instance Segmentation Framework
Code
#26
STMask(R101-DCN-FPN)
SOTA
54
AP50
· 2021-04-06
Spatial Feature Calibration and Temporal Fusion for Effective One-stage Video Instance Segmentation
Code