Tasks SotA Datasets Papers Methods Submit About

Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable Benchmarks All SotA Datasets Papers Methods

Community

Submit Results About

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/UMT

UMT

Reported on 13 benchmarks across 6 tasks · 1 paper · 10 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Computer Vision10 results

VideoonQVHighlights
R@1,IoU=0.5· 2022-03-23
56.23
best: 71.42 (InternVideo2-6B)
SOTA
UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection arXiv:2203.12745
VideoonQVHighlights
R@1,IoU=0.7· 2022-03-23
41.18
best: 56.45 (InternVideo2-6B)
SOTA
UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection arXiv:2203.12745
Video RetrievalonQVHighlights
R@1,IoU=0.5· 2022-03-23
56.23
best: 71.42 (InternVideo2-6B)
SOTA
UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection arXiv:2203.12745
Video RetrievalonQVHighlights
R@1,IoU=0.7· 2022-03-23
41.18
best: 56.45 (InternVideo2-6B)
SOTA
UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection arXiv:2203.12745
Highlight DetectiononTvSum
mAP· 2022-03-23
83.1
best: 88 (FlashVTG)
SOTA
UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection arXiv:2203.12745
Highlight DetectiononYouTube Highlights
mAP· 2022-03-23
74.9
best: 78 (SG-DETR (w/ PT))
SOTA
UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection arXiv:2203.12745
Video GroundingonQVHighlights
R@1,IoU=0.5· 2022-03-23
56.23
best: 71.42 (InternVideo2-6B)
SOTA
UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection arXiv:2203.12745
Video GroundingonQVHighlights
R@1,IoU=0.7· 2022-03-23
41.18
best: 56.45 (InternVideo2-6B)
SOTA
UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection arXiv:2203.12745
Moment RetrievalonQVHighlights
mAP· 2022-03-23
36.12
best: 58.8 (SG-DETR (w/ PT))
UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection arXiv:2203.12745
Highlight DetectiononQVHighlights
mAP· 2022-03-23
38.18
best: 44.7 (SG-DETR (w/ PT))
UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection arXiv:2203.12745

Methodology3 results

16konTvSum
mAP· 2022-03-23
83.1
best: 88 (FlashVTG)
SOTA
UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection arXiv:2203.12745
16konYouTube Highlights
mAP· 2022-03-23
74.9
best: 78 (SG-DETR (w/ PT))
SOTA
UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection arXiv:2203.12745
16konQVHighlights
mAP· 2022-03-23
38.18
best: 44.7 (SG-DETR (w/ PT))
UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection arXiv:2203.12745