TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Computer Vision/Referring Expression Segmentation/Refer-YouTube-VOS (2021 public validation)

Referring Expression Segmentation on Refer-YouTube-VOS (2021 public validation)

Metric: F (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕F▼Extra DataPaperDate↕Code
1MPG-SAM 276.1NoMPG-SAM 2: Adapting SAM 2 with Mask Priors and G...2025-01-23Code
2VRS-HQ (Chat-UniVi-13B)73.1NoThe Devil is in Temporal Token: High Quality Vid...2025-01-15Code
3GLEE-Pro72.9YesGeneral Object Foundation Model for Images and V...2023-12-14Code
4UNINEXT-H72.7NoUniversal Instance Perception as Object Discover...2023-03-12Code
5ReferDINO (Swin-B)71.5NoReferDINO: Referring Video Object Segmentation w...2025-01-24-
6MUTR70.4NoReferred by Multi-Modality: A Unified Temporal T...2023-05-25Code
7VLP (VLMo-L)69.8NoHarnessing Vision-Language Pretrained Models wit...2024-05-17-
8SOC (Joint training, Video-Swin-B)69.3NoSOC: Semantic-Assisted Object Cluster for Referr...2023-05-26Code
9UniRef-L (Swin-L)69.2No---
10DsHmp (Video-Swin-Base)69.1NoDecoupling Static and Hierarchical Motion Percep...2024-04-04Code
11UniRef++-L69NoUniRef++: Segment Every Reference Object in Spat...2023-12-25Code
12HTR (Pre-training)68.9NoTemporally Consistent Referring Video Object Seg...2024-03-28Code
13ViLLa68.6NoViLLa: Video Reasoning Segmentation with Large L...2024-07-18Code
14SgMg (Pre-training)67.4NoSpectrum-guided Multi-granularity Referring Vide...2023-07-25Code
15EPCFormer (ViT-H)67.2NoExpression Prompt Collaboration Transformer for ...2023-08-08-
16UniLSeg-10067NoUniversal Segmentation at Arbitrary Granularity ...2023-12-04Code
17GroPrompt66.9NoGroPrompt: Efficient Grounded Prompting and Adap...2024-06-18-
18LoSh-R66YesLoSh: Long-Short Text Joint Prediction Network f...2023-06-14Code
19VLT65.6NoVLT: Vision-Language Transformer and Query Gener...2022-10-28Code
20OnlineRefer (Swin-L, online)65.5NoOnlineRefer: A Simple Online Baseline for Referr...2023-07-18Code
21R2VOS (Video-Swin-T)63.1YesTowards Robust Referring Video Object Segmentati...2022-07-04Code
22SOC (Video-Swin-T)60.5NoSOC: Semantic-Assisted Object Cluster for Referr...2023-05-26Code
23UniVS(Swin-L)59.5YesUniVS: Unified and Universal Video Segmentation ...2024-02-28Code
24ReferFormer (ResNet-101)58.4YesLanguage as Queries for Referring Video Object S...2022-01-03Code
25MTTR (w=12)56.64NoEnd-to-End Referring Video Object Segmentation w...2021-11-29Code
26ReferFormer (ResNet-50)56.6YesLanguage as Queries for Referring Video Object S...2022-01-03Code
27MANET56.51NoMulti-Attention Network for Compressed Video Ref...2022-07-26Code
28Locater51.1NoLocal-Global Context Aware Transformer for Langu...2022-03-18Code
29URVOS50.8No--Code
30VLIDE50.67NoDeeply Interleaved Two-Stream Encoder for Referr...2022-03-30-
31MLRLSA48.43No---