TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Computer Vision/Video Captioning/MSVD

Video Captioning on MSVD

Metric: METEOR (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕METEOR▼Extra DataPaperDate↕Code
1VLAB51.2YesVLAB: Enhancing Video Language Pre-training by F...2023-05-22-
2VALOR51YesVALOR: Vision-Audio-Language Omni-Perception Pre...2023-04-17Code
3mPLUG-248.4NomPLUG-2: A Modularized Multi-modal Foundation Mo...2023-02-01Code
4HowToCaption46.4NoHowToCaption: Prompting LLMs to Transform Video ...2023-10-07Code
5HiTeA45.3YesHiTeA: Hierarchical Temporal-Aware Video-Languag...2022-12-30-
6Vid2Seq45.3YesVid2Seq: Large-Scale Pretraining of a Visual Lan...2023-02-27Code
7CoCap (ViT/L14)41.4NoAccurate and Fast Compressed Video Captioning2023-09-22Code
8VASTA (Vatex-backbone)40.65NoDiverse Video Captioning by Adaptive Spatio-temp...2022-08-19Code
9IcoCap (ViT-B/16)39.5Yes---
10VASTA (Kinetics-backbone)39.1NoDiverse Video Captioning by Adaptive Spatio-temp...2022-08-19Code
11IcoCap (ViT-B/32)38.9Yes---
12SEM-POS38.5NoSEM-POS: Grammatically and Semantically Correct ...2023-03-26-