TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/MUTR

MUTR

Reported on 24 benchmarks across 4 tasks · 1 paper · 18 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Computer Vision24 results

  • VideoonRef-DAVIS17
    F· 2023-05-25
    71.3
    best: 78.5 (FindTrack)
    SOTA
    Referred by Multi-Modality: A Unified Temporal Transformer for Video Object SegmentationarXiv:2305.16318
  • VideoonRef-DAVIS17
    J· 2023-05-25
    64.8
    best: 69.9 (FindTrack)
    SOTA
    Referred by Multi-Modality: A Unified Temporal Transformer for Video Object SegmentationarXiv:2305.16318
  • VideoonRef-DAVIS17
    J&F· 2023-05-25
    68
    best: 74.2 (FindTrack)
    SOTA
    Referred by Multi-Modality: A Unified Temporal Transformer for Video Object SegmentationarXiv:2305.16318
  • VideoonLong-RVOS
    J&F· 2023-05-25
    42.2
    best: 51.3 (ReferMo)
    SOTA
    Referred by Multi-Modality: A Unified Temporal Transformer for Video Object SegmentationarXiv:2305.16318
  • VideoonLong-RVOS
    tIoU· 2023-05-25
    70.4
    best: 71.7 (ReferDINO)
    SOTA
    Referred by Multi-Modality: A Unified Temporal Transformer for Video Object SegmentationarXiv:2305.16318
  • VideoonLong-RVOS
    vIoU· 2023-05-25
    36.2
    best: 42.6 (ReferMo)
    SOTA
    Referred by Multi-Modality: A Unified Temporal Transformer for Video Object SegmentationarXiv:2305.16318
  • Instance SegmentationonReferring Expressions for DAVIS 2016 & 2017
    F· 2023-05-25
    71.3
    SOTA
    Referred by Multi-Modality: A Unified Temporal Transformer for Video Object SegmentationarXiv:2305.16318
  • Instance SegmentationonReferring Expressions for DAVIS 2016 & 2017
    J· 2023-05-25
    64.8
    SOTA
    Referred by Multi-Modality: A Unified Temporal Transformer for Video Object SegmentationarXiv:2305.16318
  • Instance SegmentationonReferring Expressions for DAVIS 2016 & 2017
    J&F 1st frame· 2023-05-25
    68
    SOTA
    Referred by Multi-Modality: A Unified Temporal Transformer for Video Object SegmentationarXiv:2305.16318
  • Video Object SegmentationonRef-DAVIS17
    F· 2023-05-25
    71.3
    best: 78.5 (FindTrack)
    SOTA
    Referred by Multi-Modality: A Unified Temporal Transformer for Video Object SegmentationarXiv:2305.16318
  • Video Object SegmentationonRef-DAVIS17
    J· 2023-05-25
    64.8
    best: 69.9 (FindTrack)
    SOTA
    Referred by Multi-Modality: A Unified Temporal Transformer for Video Object SegmentationarXiv:2305.16318
  • Video Object SegmentationonRef-DAVIS17
    J&F· 2023-05-25
    68
    best: 74.2 (FindTrack)
    SOTA
    Referred by Multi-Modality: A Unified Temporal Transformer for Video Object SegmentationarXiv:2305.16318
  • Video Object SegmentationonLong-RVOS
    J&F· 2023-05-25
    42.2
    best: 51.3 (ReferMo)
    SOTA
    Referred by Multi-Modality: A Unified Temporal Transformer for Video Object SegmentationarXiv:2305.16318
  • Video Object SegmentationonLong-RVOS
    tIoU· 2023-05-25
    70.4
    best: 71.7 (ReferDINO)
    SOTA
    Referred by Multi-Modality: A Unified Temporal Transformer for Video Object SegmentationarXiv:2305.16318
  • Video Object SegmentationonLong-RVOS
    vIoU· 2023-05-25
    36.2
    best: 42.6 (ReferMo)
    SOTA
    Referred by Multi-Modality: A Unified Temporal Transformer for Video Object SegmentationarXiv:2305.16318
  • Referring Expression SegmentationonReferring Expressions for DAVIS 2016 & 2017
    F· 2023-05-25
    71.3
    SOTA
    Referred by Multi-Modality: A Unified Temporal Transformer for Video Object SegmentationarXiv:2305.16318
  • Referring Expression SegmentationonReferring Expressions for DAVIS 2016 & 2017
    J· 2023-05-25
    64.8
    SOTA
    Referred by Multi-Modality: A Unified Temporal Transformer for Video Object SegmentationarXiv:2305.16318
  • Referring Expression SegmentationonReferring Expressions for DAVIS 2016 & 2017
    J&F 1st frame· 2023-05-25
    68
    SOTA
    Referred by Multi-Modality: A Unified Temporal Transformer for Video Object SegmentationarXiv:2305.16318
  • Instance SegmentationonRefer-YouTube-VOS (2021 public validation)
    F· 2023-05-25
    70.4
    best: 76.1 (MPG-SAM 2)
    Referred by Multi-Modality: A Unified Temporal Transformer for Video Object SegmentationarXiv:2305.16318
  • Instance SegmentationonRefer-YouTube-VOS (2021 public validation)
    J· 2023-05-25
    66.4
    best: 71.7 (MPG-SAM 2)
    Referred by Multi-Modality: A Unified Temporal Transformer for Video Object SegmentationarXiv:2305.16318
  • Instance SegmentationonRefer-YouTube-VOS (2021 public validation)
    J&F· 2023-05-25
    68.4
    best: 73.9 (MPG-SAM 2)
    Referred by Multi-Modality: A Unified Temporal Transformer for Video Object SegmentationarXiv:2305.16318
  • Referring Expression SegmentationonRefer-YouTube-VOS (2021 public validation)
    F· 2023-05-25
    70.4
    best: 76.1 (MPG-SAM 2)
    Referred by Multi-Modality: A Unified Temporal Transformer for Video Object SegmentationarXiv:2305.16318
  • Referring Expression SegmentationonRefer-YouTube-VOS (2021 public validation)
    J· 2023-05-25
    66.4
    best: 71.7 (MPG-SAM 2)
    Referred by Multi-Modality: A Unified Temporal Transformer for Video Object SegmentationarXiv:2305.16318
  • Referring Expression SegmentationonRefer-YouTube-VOS (2021 public validation)
    J&F· 2023-05-25
    68.4
    best: 73.9 (MPG-SAM 2)
    Referred by Multi-Modality: A Unified Temporal Transformer for Video Object SegmentationarXiv:2305.16318