TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Computer Vision/Instance Segmentation/Refer-YouTube-VOS (2021 public validation)

Instance Segmentation on Refer-YouTube-VOS (2021 public validation)

Metric: J&F (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕J&F▼Extra DataPaperDate↕Code
1MPG-SAM 273.9NoMPG-SAM 2: Adapting SAM 2 with Mask Priors and G...2025-01-23Code
2VRS-HQ (Chat-UniVi-13B)71NoThe Devil is in Temporal Token: High Quality Vid...2025-01-15Code
3GLEE-Pro70.6YesGeneral Object Foundation Model for Images and V...2023-12-14Code
4UNINEXT-H70.1NoUniversal Instance Perception as Object Discover...2023-03-12Code
5ReferDINO (Swin-B)69.3NoReferDINO: Referring Video Object Segmentation w...2025-01-24-
6MUTR68.4NoReferred by Multi-Modality: A Unified Temporal T...2023-05-25Code
7VLP (VLMo-L)67.6NoHarnessing Vision-Language Pretrained Models wit...2024-05-17-
8UniRef-L (Swin-L)67.4No---
9HTR (Pre-training)67.1NoTemporally Consistent Referring Video Object Seg...2024-03-28Code
10DsHmp (Video-Swin-Base)67.1NoDecoupling Static and Hierarchical Motion Percep...2024-04-04Code
11UniRef++-L66.9NoUniRef++: Segment Every Reference Object in Spat...2023-12-25Code
12ViLLa66.5NoViLLa: Video Reasoning Segmentation with Large L...2024-07-18Code
13DEVA (ReferFormer)66YesTracking Anything with Decoupled Video Segmentat...2023-09-07Code
14SgMg (Pre-training)65.7NoSpectrum-guided Multi-granularity Referring Vide...2023-07-25Code
15GroPrompt65.5NoGroPrompt: Efficient Grounded Prompting and Adap...2024-06-18-
16EPCFormer (ViT-H)65NoExpression Prompt Collaboration Transformer for ...2023-08-08-
17UniLSeg-10064.9NoUniversal Segmentation at Arbitrary Granularity ...2023-12-04Code
18LoSh-R64.2YesLoSh: Long-Short Text Joint Prediction Network f...2023-06-14Code
19VLT63.8NoVLT: Vision-Language Transformer and Query Gener...2022-10-28Code
20OnlineRefer (Swin-L, online)63.5NoOnlineRefer: A Simple Online Baseline for Referr...2023-07-18Code
21R2VOS (Video-Swin-T)61.3YesTowards Robust Referring Video Object Segmentati...2022-07-04Code
22SOC (Video-Swin-T)59.2NoSOC: Semantic-Assisted Object Cluster for Referr...2023-05-26Code
23UniVS(Swin-L)58YesUniVS: Unified and Universal Video Segmentation ...2024-02-28Code
24ReferFormer (ResNet-101)57.3YesLanguage as Queries for Referring Video Object S...2022-01-03Code
25MANET55.63NoMulti-Attention Network for Compressed Video Ref...2022-07-26Code
26ReferFormer (ResNet-50)55.6YesLanguage as Queries for Referring Video Object S...2022-01-03Code
27MTTR (w=12)55.32NoEnd-to-End Referring Video Object Segmentation w...2021-11-29Code
28Locater50NoLocal-Global Context Aware Transformer for Langu...2022-03-18Code
29MLRLSA49.7No---
30VLIDE49.56NoDeeply Interleaved Two-Stream Encoder for Referr...2022-03-30-
31URVOS48.9No--Code
32InternVideo2.534.2NoInternVideo2.5: Empowering Video MLLMs with Long...2025-01-21Code