TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Computer Vision/Video Captioning/MSR-VTT

Video Captioning on MSR-VTT

Metric: METEOR (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕METEOR▼Extra DataPaperDate↕Code
1MV-GPT38.7YesEnd-to-end Generative Pretraining for Multimodal...2022-01-20-
2mPLUG-234.9NomPLUG-2: A Modularized Multi-modal Foundation Mo...2023-02-01Code
3VLAB33.4YesVLAB: Enhancing Video Language Pre-training by F...2023-05-22-
4GIT233.1YesGIT: A Generative Image-to-text Transformer for ...2022-05-27Code
5VALOR32.9YesVALOR: Vision-Audio-Language Omni-Perception Pre...2023-04-17Code
6HowToCaption32.2NoHowToCaption: Prompting LLMs to Transform Video ...2023-10-07Code
7CLIP-DCD31.3NoCLIP Meets Video Captioning: Concept-Aware Repre...2021-11-30Code
8IcoCap (ViT-B/16)31.1Yes---
9Vid2Seq30.8YesVid2Seq: Large-Scale Pretraining of a Visual Lan...2023-02-27Code
10HiTeA30.7YesHiTeA: Hierarchical Temporal-Aware Video-Languag...2022-12-30-
11SEM-POS30.7NoSEM-POS: Grammatically and Semantically Correct ...2023-03-26-
12TextKG30.5NoText with Knowledge Graph Augmented Transformer ...2023-03-22-
13IcoCap (ViT-B/32)30.3Yes---
14CoCap (ViT/L14)30.3NoAccurate and Fast Compressed Video Captioning2023-09-22Code
15VASTA (Vatex-backbone)30.24NoDiverse Video Captioning by Adaptive Spatio-temp...2022-08-19Code
16VASTA (Kinetics-backbone)30.2NoDiverse Video Captioning by Adaptive Spatio-temp...2022-08-19Code
17EMCL-Net30.2NoExpectation-Maximization Contrastive Learning fo...2022-11-21Code
18UniVL + MELTR29.26NoMELTR: Meta Loss Transformer for Learning to Fin...2023-03-23Code