TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets/MSR-VTT

MSR-VTT

TextsVideosUnknownIntroduced 2016-01-01

MSR-VTT (Microsoft Research Video to Text) is a large-scale dataset for the open domain video captioning, which consists of 10,000 video clips from 20 categories, and each video clip is annotated with 20 English sentences by Amazon Mechanical Turks. There are about 29,000 unique words in all captions. The standard splits uses 6,513 clips for training, 497 clips for validation, and 2,990 clips for testing.

Source: Learning to Discretely Compose Reasoning Module Networksfor Video Captioning

Benchmarks

10-shot image generation/text-to-video R@1Text to Video Retrieval/text-to-video R@1Text-to-Video Generation/FVDText-to-Video Generation/CLIPSIMText-to-Video Generation/CLIP-FIDText-to-Video Generation/FIDVideo/FVD16Video/Inception scoreVideo/text-to-video R@1Video/text-to-video R@5Video/text-to-video R@10Video/text-to-video Mean RankVideo/text-to-video Median RankVideo/video-to-text R@1Video/video-to-text R@5Video/video-to-text R@10Video/video-to-text Median RankVideo/video-to-text Mean RankVideo/text-to-video MedianRVideo/text-to-videoMedian RankVideo Captioning/CIDErVideo Captioning/METEORVideo Captioning/ROUGE-LVideo Captioning/BLEU-4Video Captioning/GSVideo Generation/FVD16Video Generation/Inception scoreVideo Question Answering/AccuracyVideo Retrieval/text-to-video R@1Video Retrieval/text-to-video R@5Video Retrieval/text-to-video R@10Video Retrieval/text-to-video Mean RankVideo Retrieval/text-to-video Median RankVideo Retrieval/video-to-text R@1Video Retrieval/video-to-text R@5Video Retrieval/video-to-text R@10Video Retrieval/video-to-text Median RankVideo Retrieval/video-to-text Mean RankVideo Retrieval/text-to-video MedianRVideo Retrieval/text-to-videoMedian RankZero-Shot Video Retrieval/text-to-video R@1Zero-Shot Video Retrieval/text-to-video R@5Zero-Shot Video Retrieval/text-to-video R@10Zero-Shot Video Retrieval/text-to-video Median RankZero-Shot Video Retrieval/text-to-video Mean RankZero-Shot Video Retrieval/video-to-text R@1Zero-Shot Video Retrieval/video-to-text R@5Zero-Shot Video Retrieval/video-to-text R@10Zero-Shot Video Retrieval/video-to-text Median Rank

Related Benchmarks

MSR-VTT Adverbs/Video/Acc-AMSR-VTT Adverbs/Video/mAP MMSR-VTT Adverbs/Video/mAP WMSR-VTT Adverbs/Video Retrieval/Acc-AMSR-VTT Adverbs/Video Retrieval/mAP MMSR-VTT Adverbs/Video Retrieval/mAP WMSR-VTT Adverbs/Video-Adverb Retrieval/Acc-AMSR-VTT Adverbs/Video-Adverb Retrieval/mAP MMSR-VTT Adverbs/Video-Adverb Retrieval/mAP WMSR-VTT-1kA/Video/text-to-video Mean RankMSR-VTT-1kA/Video/text-to-video Median RankMSR-VTT-1kA/Video/text-to-video R@1MSR-VTT-1kA/Video/text-to-video R@10MSR-VTT-1kA/Video/text-to-video R@5MSR-VTT-1kA/Video/video-to-text Mean RankMSR-VTT-1kA/Video/video-to-text Median RankMSR-VTT-1kA/Video/video-to-text R@1MSR-VTT-1kA/Video/video-to-text R@10MSR-VTT-1kA/Video/video-to-text R@5MSR-VTT-1kA/Video Retrieval/text-to-video Mean RankMSR-VTT-1kA/Video Retrieval/text-to-video Median RankMSR-VTT-1kA/Video Retrieval/text-to-video R@1MSR-VTT-1kA/Video Retrieval/text-to-video R@10MSR-VTT-1kA/Video Retrieval/text-to-video R@5MSR-VTT-1kA/Video Retrieval/video-to-text Mean RankMSR-VTT-1kA/Video Retrieval/video-to-text Median RankMSR-VTT-1kA/Video Retrieval/video-to-text R@1MSR-VTT-1kA/Video Retrieval/video-to-text R@10MSR-VTT-1kA/Video Retrieval/video-to-text R@5MSR-VTT-MC/Video Question Answering/AccuracyMSR-VTT-full/Zero-Shot Video Retrieval/text-to-video R@1MSR-VTT-full/Zero-Shot Video Retrieval/text-to-video R@10MSR-VTT-full/Zero-Shot Video Retrieval/text-to-video R@5MSR-VTT-full/Zero-Shot Video Retrieval/video-to-text R@1MSR-VTT-full/Zero-Shot Video Retrieval/video-to-text R@10MSR-VTT-full/Zero-Shot Video Retrieval/video-to-text R@5

Statistics

Papers
640
Benchmarks
49

Links

Homepage

Tasks

10-shot image generationText to Video RetrievalText-to-Video GenerationVideoVideo CaptioningVideo GenerationVideo Question AnsweringVideo RetrievalZero-Shot Video RetrievalZero-Shot Video-Audio Retrieval