TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/TarViS (Swin-L)

TarViS (Swin-L)

Reported on 34 benchmarks across 4 tasks · 1 paper · 11 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Computer Vision18 results

  • Video Instance SegmentationonYouTube-VIS 2021
    AP50· uses extra data· 2023-01-06
    81.4
    best: 87.3 (CAVIS(VIT-L, Offline))
    SOTA
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657
  • Video Instance SegmentationonYouTube-VIS 2021
    AP75· uses extra data· 2023-01-06
    67.6
    best: 73.2 (CAVIS(VIT-L, Offline))
    SOTA
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657
  • Video Instance SegmentationonYouTube-VIS 2021
    AR10· uses extra data· 2023-01-06
    64.8
    best: 70.7 (DVIS-DAQ(VIT-L, Offline))
    SOTA
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657
  • Video Instance SegmentationonYouTube-VIS 2021
    mask AP· uses extra data· 2023-01-06
    60.2
    best: 65.3 (CAVIS(VIT-L, Offline))
    SOTA
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657
  • Video Instance SegmentationonOVIS validation
    AR10· uses extra data· 2023-01-06
    50.4
    best: 61.8 (CAVIS(VIT-L, Offline))
    SOTA
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657
  • Panoptic SegmentationonVIPSeg
    STQ· uses extra data· 2023-01-06
    52.9
    best: 58.2 (UniVS(Swin-L))
    SOTA
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657
  • Panoptic SegmentationonVIPSeg
    VPQ· uses extra data· 2023-01-06
    48
    best: 58.5 (CAVIS(VIT-L))
    SOTA
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657
  • Video Instance SegmentationonYouTube-VIS 2021
    AR1· uses extra data· 2023-01-06
    47.6
    best: 49.7 (CAVIS(VIT-L, Offline))
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657
  • Video Instance SegmentationonOVIS validation
    AP50· uses extra data· 2023-01-06
    67.8
    best: 83.8 (DVIS-DAQ(VIT-L, Offline))
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657
  • Video Instance SegmentationonOVIS validation
    AP75· uses extra data· 2023-01-06
    44.6
    best: 63.5 (CAVIS(VIT-L, Offline))
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657
  • Video Instance SegmentationonOVIS validation
    AR1· uses extra data· 2023-01-06
    18
    best: 21.2 (CAVIS(VIT-L, Offline))
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657
  • Video Instance SegmentationonOVIS validation
    mask AP· uses extra data· 2023-01-06
    43.2
    best: 57.1 (DVIS-DAQ(VIT-L, Offline))
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657
  • Panoptic SegmentationonCityscapes-VPS
    VPQ· uses extra data· 2023-01-06
    58.9
    best: 63.1 (VIP-Deeplab)
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657
  • Panoptic SegmentationonCityscapes-VPS
    VPQ (stuff)· uses extra data· 2023-01-06
    69.9
    best: 73 (VIP-Deeplab)
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657
  • Panoptic SegmentationonCityscapes-VPS
    VPQ (thing)· uses extra data· 2023-01-06
    43.7
    best: 49.8 (Video K-Net (Swin-B))
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657
  • Panoptic SegmentationonKITTI-STEP
    AQ· uses extra data· 2023-01-06
    72
    best: 73 (Video K-Net (Swin-L))
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657
  • Panoptic SegmentationonKITTI-STEP
    SQ· uses extra data· 2023-01-06
    72
    best: 75 (Video K-Net (Swin-L))
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657
  • Panoptic SegmentationonKITTI-STEP
    STQ· uses extra data· 2023-01-06
    73
    best: 74 (Video K-Net (Swin-L))
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657

Medical8 results

  • Semantic SegmentationonVIPSeg
    STQ· uses extra data· 2023-01-06
    52.9
    best: 58.2 (UniVS(Swin-L))
    SOTA
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657
  • Semantic SegmentationonVIPSeg
    VPQ· uses extra data· 2023-01-06
    48
    best: 58.5 (CAVIS(VIT-L))
    SOTA
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657
  • Semantic SegmentationonCityscapes-VPS
    VPQ· uses extra data· 2023-01-06
    58.9
    best: 63.1 (VIP-Deeplab)
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657
  • Semantic SegmentationonCityscapes-VPS
    VPQ (stuff)· uses extra data· 2023-01-06
    69.9
    best: 73 (VIP-Deeplab)
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657
  • Semantic SegmentationonCityscapes-VPS
    VPQ (thing)· uses extra data· 2023-01-06
    43.7
    best: 49.8 (Video K-Net (Swin-B))
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657
  • Semantic SegmentationonKITTI-STEP
    AQ· uses extra data· 2023-01-06
    72
    best: 73 (Video K-Net (Swin-L))
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657
  • Semantic SegmentationonKITTI-STEP
    SQ· uses extra data· 2023-01-06
    72
    best: 75 (Video K-Net (Swin-L))
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657
  • Semantic SegmentationonKITTI-STEP
    STQ· uses extra data· 2023-01-06
    73
    best: 74 (Video K-Net (Swin-L))
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657

Audio8 results

  • 10-shot image generationonVIPSeg
    STQ· uses extra data· 2023-01-06
    52.9
    best: 58.2 (UniVS(Swin-L))
    SOTA
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657
  • 10-shot image generationonVIPSeg
    VPQ· uses extra data· 2023-01-06
    48
    best: 58.5 (CAVIS(VIT-L))
    SOTA
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657
  • 10-shot image generationonCityscapes-VPS
    VPQ· uses extra data· 2023-01-06
    58.9
    best: 63.1 (VIP-Deeplab)
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657
  • 10-shot image generationonCityscapes-VPS
    VPQ (stuff)· uses extra data· 2023-01-06
    69.9
    best: 73 (VIP-Deeplab)
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657
  • 10-shot image generationonCityscapes-VPS
    VPQ (thing)· uses extra data· 2023-01-06
    43.7
    best: 49.8 (Video K-Net (Swin-B))
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657
  • 10-shot image generationonKITTI-STEP
    AQ· uses extra data· 2023-01-06
    72
    best: 73 (Video K-Net (Swin-L))
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657
  • 10-shot image generationonKITTI-STEP
    SQ· uses extra data· 2023-01-06
    72
    best: 75 (Video K-Net (Swin-L))
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657
  • 10-shot image generationonKITTI-STEP
    STQ· uses extra data· 2023-01-06
    73
    best: 74 (Video K-Net (Swin-L))
    TarViS: A Unified Approach for Target-based Video SegmentationarXiv:2301.02657