TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets/ActivityNet Captions

ActivityNet Captions

TextsVideosUnknownIntroduced 2017-01-01

The ActivityNet Captions dataset is built on ActivityNet v1.3 which includes 20k YouTube untrimmed videos with 100k caption annotations. The videos are 120 seconds long on average. Most of the videos contain over 3 annotated events with corresponding start/end time and human-written sentences, which contain 13.5 words on average. The number of videos in train/validation/test split is 10024/4926/5044, respectively.

Source: Bidirectional Attentive Fusion with Context Gating for Dense Video Captioning Image Source: https://cs.stanford.edu/people/ranjaykrishna/densevid/

Benchmarks

10-shot image generation/Recall@SumAction Localization/Average F1Action Localization/Average PrecisionAction Localization/Average RecallDense Captioning/Live ScoreDense Video Captioning/METEORDense Video Captioning/BLEU-3Dense Video Captioning/BLEU-4Dense Video Captioning/CIDErDense Video Captioning/SODADense Video Captioning/DIV-1Dense Video Captioning/DIV-2Dense Video Captioning/RE-4Dense Video Captioning/BLEU4Dense Video Captioning/F1Dense Video Captioning/PrecisionDense Video Captioning/RecallTemporal Action Localization/Average F1Temporal Action Localization/Average PrecisionTemporal Action Localization/Average RecallText to Video Retrieval/Recall@SumVideo/Average F1Video/Average PrecisionVideo/Average RecallVideo/R@1,IoU=0.5Video/R@1,IoU=0.7Video/R@5,IoU=0.5Video/R@5,IoU=0.7Video Captioning/BLEU4Video Captioning/BLEU-3Video Captioning/CIDErVideo Captioning/ROUGE-LVideo Captioning/METEORVideo Captioning/BLEU-4Video Captioning/SODAVideo Captioning/DIV-1Video Captioning/DIV-2Video Captioning/RE-4Video Captioning/F1Video Captioning/PrecisionVideo Captioning/RecallVideo Captioning/Live ScoreZero-Shot Learning/Average F1Zero-Shot Learning/Average PrecisionZero-Shot Learning/Average Recall

Statistics

Papers
255
Benchmarks
45

Links

Homepage

Tasks

10-shot image generationAction LocalizationDense CaptioningDense Video CaptioningLive Video CaptioningNatural Language Moment RetrievalPartially Relevant Video RetrievalTemporal Action LocalizationTemporal Action Proposal GenerationText to Video RetrievalVideoVideo CaptioningZero-Shot Learning