TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/UMT

UMT

Reported on 13 benchmarks across 6 tasks · 1 paper · 10 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Computer Vision10 results

  • VideoonQVHighlights
    R@1,IoU=0.5· 2022-03-23
    56.23
    best: 71.42 (InternVideo2-6B)
    SOTA
    UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight DetectionarXiv:2203.12745
  • VideoonQVHighlights
    R@1,IoU=0.7· 2022-03-23
    41.18
    best: 56.45 (InternVideo2-6B)
    SOTA
    UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight DetectionarXiv:2203.12745
  • Video RetrievalonQVHighlights
    R@1,IoU=0.5· 2022-03-23
    56.23
    best: 71.42 (InternVideo2-6B)
    SOTA
    UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight DetectionarXiv:2203.12745
  • Video RetrievalonQVHighlights
    R@1,IoU=0.7· 2022-03-23
    41.18
    best: 56.45 (InternVideo2-6B)
    SOTA
    UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight DetectionarXiv:2203.12745
  • Highlight DetectiononTvSum
    mAP· 2022-03-23
    83.1
    best: 88 (FlashVTG)
    SOTA
    UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight DetectionarXiv:2203.12745
  • Highlight DetectiononYouTube Highlights
    mAP· 2022-03-23
    74.9
    best: 78 (SG-DETR (w/ PT))
    SOTA
    UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight DetectionarXiv:2203.12745
  • Video GroundingonQVHighlights
    R@1,IoU=0.5· 2022-03-23
    56.23
    best: 71.42 (InternVideo2-6B)
    SOTA
    UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight DetectionarXiv:2203.12745
  • Video GroundingonQVHighlights
    R@1,IoU=0.7· 2022-03-23
    41.18
    best: 56.45 (InternVideo2-6B)
    SOTA
    UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight DetectionarXiv:2203.12745
  • Moment RetrievalonQVHighlights
    mAP· 2022-03-23
    36.12
    best: 58.8 (SG-DETR (w/ PT))
    UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight DetectionarXiv:2203.12745
  • Highlight DetectiononQVHighlights
    mAP· 2022-03-23
    38.18
    best: 44.7 (SG-DETR (w/ PT))
    UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight DetectionarXiv:2203.12745

Methodology3 results

  • 16konTvSum
    mAP· 2022-03-23
    83.1
    best: 88 (FlashVTG)
    SOTA
    UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight DetectionarXiv:2203.12745
  • 16konYouTube Highlights
    mAP· 2022-03-23
    74.9
    best: 78 (SG-DETR (w/ PT))
    SOTA
    UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight DetectionarXiv:2203.12745
  • 16konQVHighlights
    mAP· 2022-03-23
    38.18
    best: 44.7 (SG-DETR (w/ PT))
    UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight DetectionarXiv:2203.12745