TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Computer Vision/Instance Segmentation/Refer-YouTube-VOS (2021 public validation)

Instance Segmentation on Refer-YouTube-VOS (2021 public validation)

Metric: J (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕J▼Extra DataPaperDate↕Code
1MPG-SAM 271.7NoMPG-SAM 2: Adapting SAM 2 with Mask Priors and G...2025-01-23Code
2VRS-HQ (Chat-UniVi-13B)69NoThe Devil is in Temporal Token: High Quality Vid...2025-01-15Code
3GLEE-Pro68.2YesGeneral Object Foundation Model for Images and V...2023-12-14Code
4UNINEXT-H67.6NoUniversal Instance Perception as Object Discover...2023-03-12Code
5ReferDINO (Swin-B)67NoReferDINO: Referring Video Object Segmentation w...2025-01-24-
6MUTR66.4NoReferred by Multi-Modality: A Unified Temporal T...2023-05-25Code
7UniRef-L (Swin-L)65.5No---
8VLP (VLMo-L)65.3NoHarnessing Vision-Language Pretrained Models wit...2024-05-17-
9SOC (Joint training, Video-Swin-B)65.3NoSOC: Semantic-Assisted Object Cluster for Referr...2023-05-26Code
10HTR (Pre-training)65.3NoTemporally Consistent Referring Video Object Seg...2024-03-28Code
11DsHmp (Video-Swin-Base)65NoDecoupling Static and Hierarchical Motion Percep...2024-04-04Code
12UniRef++-L64.8NoUniRef++: Segment Every Reference Object in Spat...2023-12-25Code
13ViLLa64.6NoViLLa: Video Reasoning Segmentation with Large L...2024-07-18Code
14GroPrompt64.1NoGroPrompt: Efficient Grounded Prompting and Adap...2024-06-18-
15SgMg (Pre-training)63.9NoSpectrum-guided Multi-granularity Referring Vide...2023-07-25Code
16EPCFormer (ViT-H)62.9NoExpression Prompt Collaboration Transformer for ...2023-08-08-
17UniLSeg-10062.8NoUniversal Segmentation at Arbitrary Granularity ...2023-12-04Code
18LoSh-R62.5YesLoSh: Long-Short Text Joint Prediction Network f...2023-06-14Code
19VLT61.9NoVLT: Vision-Language Transformer and Query Gener...2022-10-28Code
20OnlineRefer (Swin-L, online)61.6NoOnlineRefer: A Simple Online Baseline for Referr...2023-07-18Code
21R2VOS (Video-Swin-T)59.6YesTowards Robust Referring Video Object Segmentati...2022-07-04Code
22SOC (Video-Swin-T)57.8NoSOC: Semantic-Assisted Object Cluster for Referr...2023-05-26Code
23UniVS(Swin-L)56.8YesUniVS: Unified and Universal Video Segmentation ...2024-02-28Code
24ReferFormer (ResNet-101)56.1YesLanguage as Queries for Referring Video Object S...2022-01-03Code
25ReferFormer (ResNet-50)54.8YesLanguage as Queries for Referring Video Object S...2022-01-03Code
26MANET54.75NoMulti-Attention Network for Compressed Video Ref...2022-07-26Code
27MTTR (w=12)54NoEnd-to-End Referring Video Object Segmentation w...2021-11-29Code
28MLRLSA50.96No---
29Locater48.8NoLocal-Global Context Aware Transformer for Langu...2022-03-18Code
30VLIDE48.44NoDeeply Interleaved Two-Stream Encoder for Referr...2022-03-30-
31URVOS47No--Code