TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Computer Vision/Video Captioning/YouCook2

Video Captioning on YouCook2

Metric: BLEU-4 (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕BLEU-4▼Extra DataPaperDate↕Code
1VAST18.2YesVAST: A Vision-Audio-Subtitle-Text Omni-Modality...2023-05-29Code
2UniVL + MELTR17.92NoMELTR: Meta Loss Transformer for Learning to Fin...2023-03-23Code
3UniVL17.35YesUniVL: A Unified Video and Language Pre-Training...2020-02-15Code
4VideoCoCa14.2YesVideoCoCa: Video-Text Modeling with Zero-Shot Tr...2022-12-09-
5VLM12.27YesVLM: Task-agnostic Video-Language Model Pre-trai...2021-05-20Code
6E2vidD6-MASSvid-BiD12.04YesMultimodal Pretraining for Dense Video Captioning2020-11-10Code
7TextKG11.7NoText with Knowledge Graph Augmented Transformer ...2023-03-22-
8COOT11.3YesCOOT: Cooperative Hierarchical Transformer for V...2020-11-01Code
9COSA10.1YesCOSA: Concatenated Sample Pretrained Vision-Lang...2023-06-15Code
10HowToCaption8.8NoHowToCaption: Prompting LLMs to Transform Video ...2023-10-07Code
11OmniVL8.72NoOmniVL:One Foundation Model for Image-Language a...2022-09-15-
12Zhou4.38NoEnd-to-End Dense Video Captioning with Masked Tr...2018-04-03Code
13VideoBERT + S3D4.33NoVideoBERT: A Joint Model for Video and Language ...2019-04-03Code