TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/TarViS (Swin-T)

TarViS (Swin-T)

Reported on 29 benchmarks across 4 tasks · 1 paper

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Computer Vision13 results

  • Video Instance SegmentationonYouTube-VIS 2021
    AP50· uses extra data· 2023-01-06
    71.6
    best: 87.3 (CAVIS(VIT-L, Offline))
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657
  • Video Instance SegmentationonYouTube-VIS 2021
    AP75· uses extra data· 2023-01-06
    56.6
    best: 73.2 (CAVIS(VIT-L, Offline))
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657
  • Video Instance SegmentationonYouTube-VIS 2021
    AR1· uses extra data· 2023-01-06
    42.2
    best: 49.7 (CAVIS(VIT-L, Offline))
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657
  • Video Instance SegmentationonYouTube-VIS 2021
    AR10· uses extra data· 2023-01-06
    57.2
    best: 70.7 (DVIS-DAQ(VIT-L, Offline))
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657
  • Video Instance SegmentationonYouTube-VIS 2021
    mask AP· uses extra data· 2023-01-06
    50.9
    best: 65.3 (CAVIS(VIT-L, Offline))
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657
  • Panoptic SegmentationonCityscapes-VPS
    VPQ· uses extra data· 2023-01-06
    58
    best: 63.1 (VIP-Deeplab)
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657
  • Panoptic SegmentationonCityscapes-VPS
    VPQ (stuff)· uses extra data· 2023-01-06
    69
    best: 73 (VIP-Deeplab)
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657
  • Panoptic SegmentationonCityscapes-VPS
    VPQ (thing)· uses extra data· 2023-01-06
    42.9
    best: 49.8 (Video K-Net (Swin-B))
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657
  • Panoptic SegmentationonVIPSeg
    STQ· uses extra data· 2023-01-06
    45.3
    best: 58.2 (UniVS(Swin-L))
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657
  • Panoptic SegmentationonVIPSeg
    VPQ· uses extra data· 2023-01-06
    35.8
    best: 58.5 (CAVIS(VIT-L))
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657
  • Panoptic SegmentationonKITTI-STEP
    AQ· uses extra data· 2023-01-06
    71.2
    best: 73 (Video K-Net (Swin-L))
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657
  • Panoptic SegmentationonKITTI-STEP
    SQ· uses extra data· 2023-01-06
    69.9
    best: 75 (Video K-Net (Swin-L))
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657
  • Panoptic SegmentationonKITTI-STEP
    STQ· uses extra data· 2023-01-06
    70.6
    best: 74 (Video K-Net (Swin-L))
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657

Medical8 results

  • Semantic SegmentationonCityscapes-VPS
    VPQ· uses extra data· 2023-01-06
    58
    best: 63.1 (VIP-Deeplab)
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657
  • Semantic SegmentationonCityscapes-VPS
    VPQ (stuff)· uses extra data· 2023-01-06
    69
    best: 73 (VIP-Deeplab)
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657
  • Semantic SegmentationonCityscapes-VPS
    VPQ (thing)· uses extra data· 2023-01-06
    42.9
    best: 49.8 (Video K-Net (Swin-B))
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657
  • Semantic SegmentationonVIPSeg
    STQ· uses extra data· 2023-01-06
    45.3
    best: 58.2 (UniVS(Swin-L))
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657
  • Semantic SegmentationonVIPSeg
    VPQ· uses extra data· 2023-01-06
    35.8
    best: 58.5 (CAVIS(VIT-L))
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657
  • Semantic SegmentationonKITTI-STEP
    AQ· uses extra data· 2023-01-06
    71.2
    best: 73 (Video K-Net (Swin-L))
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657
  • Semantic SegmentationonKITTI-STEP
    SQ· uses extra data· 2023-01-06
    69.9
    best: 75 (Video K-Net (Swin-L))
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657
  • Semantic SegmentationonKITTI-STEP
    STQ· uses extra data· 2023-01-06
    70.6
    best: 74 (Video K-Net (Swin-L))
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657

Audio8 results

  • 10-shot image generationonCityscapes-VPS
    VPQ· uses extra data· 2023-01-06
    58
    best: 63.1 (VIP-Deeplab)
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657
  • 10-shot image generationonCityscapes-VPS
    VPQ (stuff)· uses extra data· 2023-01-06
    69
    best: 73 (VIP-Deeplab)
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657
  • 10-shot image generationonCityscapes-VPS
    VPQ (thing)· uses extra data· 2023-01-06
    42.9
    best: 49.8 (Video K-Net (Swin-B))
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657
  • 10-shot image generationonVIPSeg
    STQ· uses extra data· 2023-01-06
    45.3
    best: 58.2 (UniVS(Swin-L))
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657
  • 10-shot image generationonVIPSeg
    VPQ· uses extra data· 2023-01-06
    35.8
    best: 58.5 (CAVIS(VIT-L))
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657
  • 10-shot image generationonKITTI-STEP
    AQ· uses extra data· 2023-01-06
    71.2
    best: 73 (Video K-Net (Swin-L))
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657
  • 10-shot image generationonKITTI-STEP
    SQ· uses extra data· 2023-01-06
    69.9
    best: 75 (Video K-Net (Swin-L))
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657
  • 10-shot image generationonKITTI-STEP
    STQ· uses extra data· 2023-01-06
    70.6
    best: 74 (Video K-Net (Swin-L))
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657