TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Computer Vision/Video Captioning/YouCook2

Video Captioning on YouCook2

Metric: ROUGE-L (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕ROUGE-L▼Extra DataPaperDate↕Code
1UniVL + MELTR47.04NoMELTR: Meta Loss Transformer for Learning to Fin...2023-03-23Code
2UniVL46.52YesUniVL: A Unified Video and Language Pre-Training...2020-02-15Code
3VLM41.51YesVLM: Task-agnostic Video-Language Model Pre-trai...2021-05-20Code
4TextKG40.2NoText with Knowledge Graph Augmented Transformer ...2023-03-22-
5E2vidD6-MASSvid-BiD39.03YesMultimodal Pretraining for Dense Video Captioning2020-11-10Code
6E2vidD6-MASSalign-BiD39.03YesMultimodal Pretraining for Dense Video Captioning2020-11-10Code
7COOT37.94YesCOOT: Cooperative Hierarchical Transformer for V...2020-11-01Code
8VideoCoCa37.7YesVideoCoCa: Video-Text Modeling with Zero-Shot Tr...2022-12-09-
9HowToCaption37.3NoHowToCaption: Prompting LLMs to Transform Video ...2023-10-07Code
10OmniVL36.09NoOmniVL:One Foundation Model for Image-Language a...2022-09-15-
11VideoBERT + S3D28.8NoVideoBERT: A Joint Model for Video and Language ...2019-04-03Code
12Zhou27.44NoEnd-to-End Dense Video Captioning with Masked Tr...2018-04-03Code