Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Computer Vision
/
Video Instance Segmentation
/
OVIS validation
Video Instance Segmentation on OVIS validation
Metric: AP75 (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
Sort:
AP75 (best first)
AP75 (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
AP75
▼
Extra Data
Paper
Date
↕
Code
1
CAVIS(VIT-L, Offline)
63.5
Yes
Context-Aware Video Instance Segmentation
2024-07-03
Code
2
DVIS-DAQ(VIT-L, Offline)
62.9
Yes
DVIS-DAQ: Improving Video Segmentation via Dynam...
2024-03-29
Code
3
DVIS++(VIT-L,Offline)
58.5
Yes
DVIS++: Improved Decoupled Framework for Univers...
2023-12-20
Code
4
GLEE-Pro
55.5
Yes
General Object Foundation Model for Images and V...
2023-12-14
Code
5
DVIS++(VIT-L, Online)
55
Yes
DVIS++: Improved Decoupled Framework for Univers...
2023-12-20
Code
6
DVIS(Swin-L, Offline)
53
No
DVIS: Decoupled Video Instance Segmentation Fram...
2023-06-06
Code
7
UNINEXT (ViT-H, Online)
52.2
Yes
Universal Instance Perception as Object Discover...
2023-03-12
Code
8
DVIS(Swin-L, Online)
49.2
No
DVIS: Decoupled Video Instance Segmentation Fram...
2023-06-06
Code
9
RefineVIS (Swin-L, offline)
48.4
Yes
RefineVIS: Video Instance Segmentation with Temp...
2023-06-07
-
10
GRAtt-VIS (Swin-L)
47.8
Yes
GRAtt-VIS: Gated Residual Attention for Auto Rec...
2023-05-26
Code
11
GenVIS (Swin-L)
47.8
Yes
A Generalized Framework for Video Instance Segme...
2022-11-16
Code
12
CTVIS (Swin-L)
47.5
Yes
CTVIS: Consistent Training for Online Video Inst...
2023-07-24
Code
13
IDOL (Swin-L)
45.2
No
In Defense of Online Models for Video Instance S...
2022-07-21
Code
14
TarViS (Swin-L)
44.6
Yes
TarViS: A Unified Approach for Target-based Vide...
2023-01-06
Code
15
MDQE(SwinL)
44.3
No
MDQE: Mining Discriminative Query Embeddings to ...
2023-03-25
Code
16
NOVIS (Swin-L)
43.8
Yes
NOVIS: A Case for End-to-End Near-Online Video I...
2023-08-29
-
17
ROVIS (Swin-L)
42.6
No
Robust Online Video Instance Segmentation with T...
2022-11-16
Code
18
MinVIS (Swin-L)
41.3
No
MinVIS: A Minimal Video Instance Segmentation Fr...
2022-08-03
Code
19
DVIS++(R50, Offline)
40.9
Yes
DVIS++: Improved Decoupled Framework for Univers...
2023-12-20
Code
20
BoxVIS(Swin-L & Box-sup)
39.9
No
BoxVIS: Video Instance Segmentation with Box Ann...
2023-03-26
Code
21
DeVIS (Swin-L)
38.3
No
DeVIS: Making Deformable Transformers Work for V...
2022-07-22
Code
22
DVIS++(R50, Online)
37.3
Yes
DVIS++: Improved Decoupled Framework for Univers...
2023-12-20
Code
23
GRAtt-VIS (ResNet-50)
36.8
Yes
GRAtt-VIS: Gated Residual Attention for Auto Rec...
2023-05-26
Code
24
UNINEXT (ResNet-50, Online)
35.6
Yes
Universal Instance Perception as Object Discover...
2023-03-12
Code
25
CTVIS (ResNet-50)
34.9
Yes
CTVIS: Consistent Training for Online Video Inst...
2023-07-24
Code
26
TarViS (Swin-T)
34.4
Yes
TarViS: A Unified Approach for Target-based Vide...
2023-01-06
Code
27
NOVIS (ResNet-50)
32.6
Yes
NOVIS: A Case for End-to-End Near-Online Video I...
2023-08-29
-
28
TarViS (ResNet-50)
30.4
Yes
TarViS: A Unified Approach for Target-based Vide...
2023-01-06
Code
29
Tube-Link(ResNet-50)
30.2
No
Tube-Link: A Flexible Cross Tube Framework for U...
2023-03-22
Code
30
IDOL (ResNet-50)
30
No
In Defense of Online Models for Video Instance S...
2022-07-21
Code
31
VITA (Swin-L)
24.9
Yes
VITA: Video Instance Segmentation via Object Tok...
2022-06-09
Code
32
InstanceFormer (Swin-L)
21.61
Yes
InstanceFormer: An Online Video Instance Segment...
2022-08-22
Code
33
DeVIS (ResNet-50)
20.8
No
DeVIS: Making Deformable Transformers Work for V...
2022-07-22
Code
34
InstanceFormer(ResNet-50)
18.1
Yes
InstanceFormer: An Online Video Instance Segment...
2022-08-22
Code
35
CrossVIS (ResNet-50, calibration)
16.9
No
Crossover Learning for Fast Online Video Instanc...
2021-04-13
Code
36
STMask(R101-DCN-FPN)
15.2
No
Spatial Feature Calibration and Temporal Fusion ...
2021-04-06
Code
37
TeViT (ResNet-50)
15
No
Temporally Efficient Vision Transformer for Vide...
2022-04-18
Code
38
Mask2Former-VIS
14.1
No
Mask2Former for Video Instance Segmentation
2021-12-20
Code
39
D2Conv3D (ResNet-50)
13.7
No
-
-
Code
40
STC (ResNet-50)
13.4
No
STC: Spatio-Temporal Contrastive Learning for Vi...
2022-02-08
-
41
CMaskTrack R-CNN (ResNet-50)
13.1
No
Occluded Video Instance Segmentation: A Benchmark
2021-02-02
Code
42
CSipMask (ResNet-50)
12.5
No
Occluded Video Instance Segmentation: A Benchmark
2021-02-02
Code
43
CrossVIS (ResNet-50)
12.1
No
Crossover Learning for Fast Online Video Instanc...
2021-04-13
Code
#1
CAVIS(VIT-L, Offline)
SOTA
63.5
AP75
· Extra Data
· 2024-07-03
Context-Aware Video Instance Segmentation
Code
#2
DVIS-DAQ(VIT-L, Offline)
SOTA
62.9
AP75
· Extra Data
· 2024-03-29
DVIS-DAQ: Improving Video Segmentation via Dynamic Anchor Queries
Code
#3
DVIS++(VIT-L,Offline)
SOTA
58.5
AP75
· Extra Data
· 2023-12-20
DVIS++: Improved Decoupled Framework for Universal Video Segmentation
Code
#4
GLEE-Pro
SOTA
55.5
AP75
· Extra Data
· 2023-12-14
General Object Foundation Model for Images and Videos at Scale
Code
#5
DVIS++(VIT-L, Online)
55
AP75
· Extra Data
· 2023-12-20
DVIS++: Improved Decoupled Framework for Universal Video Segmentation
Code
#6
DVIS(Swin-L, Offline)
SOTA
53
AP75
· 2023-06-06
DVIS: Decoupled Video Instance Segmentation Framework
Code
#7
UNINEXT (ViT-H, Online)
SOTA
52.2
AP75
· Extra Data
· 2023-03-12
Universal Instance Perception as Object Discovery and Retrieval
Code
#8
DVIS(Swin-L, Online)
49.2
AP75
· 2023-06-06
DVIS: Decoupled Video Instance Segmentation Framework
Code
#9
RefineVIS (Swin-L, offline)
48.4
AP75
· Extra Data
· 2023-06-07
RefineVIS: Video Instance Segmentation with Temporal Attention Refinement
#10
GRAtt-VIS (Swin-L)
47.8
AP75
· Extra Data
· 2023-05-26
GRAtt-VIS: Gated Residual Attention for Auto Rectifying Video Instance Segmentation
Code
#11
GenVIS (Swin-L)
SOTA
47.8
AP75
· Extra Data
· 2022-11-16
A Generalized Framework for Video Instance Segmentation
Code
#12
CTVIS (Swin-L)
47.5
AP75
· Extra Data
· 2023-07-24
CTVIS: Consistent Training for Online Video Instance Segmentation
Code
#13
IDOL (Swin-L)
SOTA
45.2
AP75
· 2022-07-21
In Defense of Online Models for Video Instance Segmentation
Code
#14
TarViS (Swin-L)
44.6
AP75
· Extra Data
· 2023-01-06
TarViS: A Unified Approach for Target-based Video Segmentation
Code
#15
MDQE(SwinL)
44.3
AP75
· 2023-03-25
MDQE: Mining Discriminative Query Embeddings to Segment Occluded Instances on Challenging Videos
Code
#16
NOVIS (Swin-L)
43.8
AP75
· Extra Data
· 2023-08-29
NOVIS: A Case for End-to-End Near-Online Video Instance Segmentation
#17
ROVIS (Swin-L)
42.6
AP75
· 2022-11-16
Robust Online Video Instance Segmentation with Track Queries
Code
#18
MinVIS (Swin-L)
41.3
AP75
· 2022-08-03
MinVIS: A Minimal Video Instance Segmentation Framework without Video-based Training
Code
#19
DVIS++(R50, Offline)
40.9
AP75
· Extra Data
· 2023-12-20
DVIS++: Improved Decoupled Framework for Universal Video Segmentation
Code
#20
BoxVIS(Swin-L & Box-sup)
39.9
AP75
· 2023-03-26
BoxVIS: Video Instance Segmentation with Box Annotations
Code
#21
DeVIS (Swin-L)
38.3
AP75
· 2022-07-22
DeVIS: Making Deformable Transformers Work for Video Instance Segmentation
Code
#22
DVIS++(R50, Online)
37.3
AP75
· Extra Data
· 2023-12-20
DVIS++: Improved Decoupled Framework for Universal Video Segmentation
Code
#23
GRAtt-VIS (ResNet-50)
36.8
AP75
· Extra Data
· 2023-05-26
GRAtt-VIS: Gated Residual Attention for Auto Rectifying Video Instance Segmentation
Code
#24
UNINEXT (ResNet-50, Online)
35.6
AP75
· Extra Data
· 2023-03-12
Universal Instance Perception as Object Discovery and Retrieval
Code
#25
CTVIS (ResNet-50)
34.9
AP75
· Extra Data
· 2023-07-24
CTVIS: Consistent Training for Online Video Instance Segmentation
Code
#26
TarViS (Swin-T)
34.4
AP75
· Extra Data
· 2023-01-06
TarViS: A Unified Approach for Target-based Video Segmentation
Code
#27
NOVIS (ResNet-50)
32.6
AP75
· Extra Data
· 2023-08-29
NOVIS: A Case for End-to-End Near-Online Video Instance Segmentation
#28
TarViS (ResNet-50)
30.4
AP75
· Extra Data
· 2023-01-06
TarViS: A Unified Approach for Target-based Video Segmentation
Code
#29
Tube-Link(ResNet-50)
30.2
AP75
· 2023-03-22
Tube-Link: A Flexible Cross Tube Framework for Universal Video Segmentation
Code
#30
IDOL (ResNet-50)
30
AP75
· 2022-07-21
In Defense of Online Models for Video Instance Segmentation
Code
#31
VITA (Swin-L)
SOTA
24.9
AP75
· Extra Data
· 2022-06-09
VITA: Video Instance Segmentation via Object Token Association
Code
#32
InstanceFormer (Swin-L)
21.61
AP75
· Extra Data
· 2022-08-22
InstanceFormer: An Online Video Instance Segmentation Framework
Code
#33
DeVIS (ResNet-50)
20.8
AP75
· 2022-07-22
DeVIS: Making Deformable Transformers Work for Video Instance Segmentation
Code
#34
InstanceFormer(ResNet-50)
18.1
AP75
· Extra Data
· 2022-08-22
InstanceFormer: An Online Video Instance Segmentation Framework
Code
#35
CrossVIS (ResNet-50, calibration)
SOTA
16.9
AP75
· 2021-04-13
Crossover Learning for Fast Online Video Instance Segmentation
Code
#36
STMask(R101-DCN-FPN)
SOTA
15.2
AP75
· 2021-04-06
Spatial Feature Calibration and Temporal Fusion for Effective One-stage Video Instance Segmentation
Code
#37
TeViT (ResNet-50)
15
AP75
· 2022-04-18
Temporally Efficient Vision Transformer for Video Instance Segmentation
Code
#38
Mask2Former-VIS
14.1
AP75
· 2021-12-20
Mask2Former for Video Instance Segmentation
Code
#39
D2Conv3D (ResNet-50)
13.7
AP75
No paper
Code
#40
STC (ResNet-50)
13.4
AP75
· 2022-02-08
STC: Spatio-Temporal Contrastive Learning for Video Instance Segmentation
#41
CMaskTrack R-CNN (ResNet-50)
SOTA
13.1
AP75
· 2021-02-02
Occluded Video Instance Segmentation: A Benchmark
Code
#42
CSipMask (ResNet-50)
12.5
AP75
· 2021-02-02
Occluded Video Instance Segmentation: A Benchmark
Code
#43
CrossVIS (ResNet-50)
12.1
AP75
· 2021-04-13
Crossover Learning for Fast Online Video Instance Segmentation
Code