Tasks SotA Datasets Papers Methods Submit About

Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable Benchmarks All SotA Datasets Papers Methods

Community

Submit Results About

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/Vid2Seq

Vid2Seq

Reported on 32 benchmarks across 3 tasks · 2 papers · 24 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Computer Vision33 results

Video CaptioningonVidChapters-7M
CIDEr· uses extra data· 2023-09-25
55.7
best: 120.5
SOTA
VidChapters-7M: Video Chapters at Scale arXiv:2309.13952
Dense Video CaptioningonVidChapters-7M
CIDEr· uses extra data· 2023-09-25
55.7
SOTA
VidChapters-7M: Video Chapters at Scale arXiv:2309.13952
Video ChapteringonVidChapters-7M
CIDEr· 2023-09-25
55.7
SOTA
VidChapters-7M: Video Chapters at Scale arXiv:2309.13952
Video ChapteringonVidChapters-7M
P@0.5· 2023-09-25
43.1
SOTA
VidChapters-7M: Video Chapters at Scale arXiv:2309.13952
Video ChapteringonVidChapters-7M
P@0.7· 2023-09-25
26.4
SOTA
VidChapters-7M: Video Chapters at Scale arXiv:2309.13952
Video ChapteringonVidChapters-7M
P@3s· 2023-09-25
24
SOTA
VidChapters-7M: Video Chapters at Scale arXiv:2309.13952
Video ChapteringonVidChapters-7M
P@5s· 2023-09-25
30.3
best: 52 (Chapter-Llama)
SOTA
VidChapters-7M: Video Chapters at Scale arXiv:2309.13952
Video ChapteringonVidChapters-7M
R@0.5· 2023-09-25
48.2
SOTA
VidChapters-7M: Video Chapters at Scale arXiv:2309.13952
Video ChapteringonVidChapters-7M
R@0.7· 2023-09-25
28.5
SOTA
VidChapters-7M: Video Chapters at Scale arXiv:2309.13952
Video ChapteringonVidChapters-7M
R@3s· 2023-09-25
28.5
SOTA
VidChapters-7M: Video Chapters at Scale arXiv:2309.13952
Video ChapteringonVidChapters-7M
R@5s· 2023-09-25
36.4
SOTA
VidChapters-7M: Video Chapters at Scale arXiv:2309.13952
Video ChapteringonVidChapters-7M
SODA· 2023-09-25
0.114
SOTA
VidChapters-7M: Video Chapters at Scale arXiv:2309.13952
Video CaptioningonYouCook2
CIDEr· uses extra data· 2023-02-27
47.1
best: 116.4 (HowToCaption)
SOTA
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning arXiv:2302.14115
Video CaptioningonYouCook2
SODA· uses extra data· 2023-02-27
7.9
best: 10.73 (HiCM²)
SOTA
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning arXiv:2302.14115
Video CaptioningonViTT
CIDEr· uses extra data· 2023-02-27
43.5
best: 51.2 (HiCM²)
SOTA
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning arXiv:2302.14115
Video CaptioningonViTT
METEOR· uses extra data· 2023-02-27
8.5
best: 9.6 (HiCM²)
SOTA
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning arXiv:2302.14115
Video CaptioningonViTT
SODA· uses extra data· 2023-02-27
0.135
best: 9.1 (Vid2Seq (VidChapters-7M PT))
SOTA
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning arXiv:2302.14115
Dense Video CaptioningonYouCook2
CIDEr· uses extra data· 2023-02-27
47.1
best: 71.84 (HiCM²)
SOTA
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning arXiv:2302.14115
Dense Video CaptioningonYouCook2
METEOR· uses extra data· 2023-02-27
9.3
best: 12.8 (HiCM²)
SOTA
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning arXiv:2302.14115
Dense Video CaptioningonYouCook2
SODA· uses extra data· 2023-02-27
7.9
best: 10.73 (HiCM²)
SOTA
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning arXiv:2302.14115
Dense Video CaptioningonViTT
CIDEr· uses extra data· 2023-02-27
43.5
best: 51.2 (HiCM²)
SOTA
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning arXiv:2302.14115
Dense Video CaptioningonViTT
METEOR· uses extra data· 2023-02-27
8.5
best: 9.6 (HiCM²)
SOTA
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning arXiv:2302.14115
Dense Video CaptioningonViTT
SODA· uses extra data· 2023-02-27
0.135
best: 9.1 (Vid2Seq (VidChapters-7M PT))
SOTA
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning arXiv:2302.14115
Dense Video CaptioningonActivityNet Captions
METEOR· uses extra data· 2023-02-27
17
SOTA
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning arXiv:2302.14115
Video CaptioningonMSR-VTT
CIDEr· uses extra data· 2023-02-27
64.6
best: 80 (mPLUG-2)
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning arXiv:2302.14115
Video CaptioningonMSR-VTT
METEOR· uses extra data· 2023-02-27
30.8
best: 38.7 (MV-GPT)
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning arXiv:2302.14115
Video CaptioningonMSVD
CIDEr· uses extra data· 2023-02-27
146.2
best: 195.6 (MaMMUT)
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning arXiv:2302.14115
Video CaptioningonMSVD
METEOR· uses extra data· 2023-02-27
45.3
best: 51.2 (VLAB)
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning arXiv:2302.14115
Video CaptioningonYouCook2
METEOR· uses extra data· 2023-02-27
9.3
best: 22.56 (UniVL + MELTR)
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning arXiv:2302.14115
Video CaptioningonActivityNet Captions
CIDEr· uses extra data· 2023-02-27
28
best: 39.3 (VideoCoCa)
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning arXiv:2302.14115
Video CaptioningonActivityNet Captions
METEOR· uses extra data· 2023-02-27
17
best: 17.97 (VLTinT (ae-test split) C3D/Ling)
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning arXiv:2302.14115
Dense Video CaptioningonActivityNet Captions
CIDEr· uses extra data· 2023-02-27
28
best: 33.33 (GVL)
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning arXiv:2302.14115
Video CaptioningonVidChapters-7M
CIDEr· uses extra data
120.5