TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets/ActivityNet

ActivityNet

VideosUnknownIntroduced 2015-01-01

The ActivityNet dataset contains 200 different types of activities and a total of 849 hours of videos collected from YouTube. ActivityNet is the largest benchmark for temporal activity detection to date in terms of both the number of activity categories and number of videos, making the task particularly challenging. Version 1.3 of the dataset contains 19994 untrimmed videos in total and is divided into three disjoint subsets, training, validation, and testing by a ratio of 2:1:1. On average, each activity category has 137 untrimmed videos. Each video on average has 1.41 activities which are annotated with temporal boundaries. The ground-truth annotations of test videos are not public.

Source: Dynamic Temporal Pyramid Network: A Closer Look at Multi-Scale Modeling for Activity Detection

Benchmarks

Action Detection/mIoUAction Recognition/mAPAction Recognition In Videos/mAPActivity Recognition/mAPVideo/text-to-video R@1Video/text-to-video R@5Video/text-to-video R@10Video/text-to-video R@50Video/text-to-video Mean RankVideo/text-to-video Median RankVideo/video-to-text R@1Video/video-to-text R@5Video/video-to-text Mean RankVideo/video-to-text Median RankVideo/video-to-text R@10Video/video-to-text R@50Video/Top 1 AccuracyVideo/Top 5 AccuracyVideo Retrieval/text-to-video R@1Video Retrieval/text-to-video R@5Video Retrieval/text-to-video R@10Video Retrieval/text-to-video R@50Video Retrieval/text-to-video Mean RankVideo Retrieval/text-to-video Median RankVideo Retrieval/video-to-text R@1Video Retrieval/video-to-text R@5Video Retrieval/video-to-text Mean RankVideo Retrieval/video-to-text Median RankVideo Retrieval/video-to-text R@10Video Retrieval/video-to-text R@50Visual Question Answering (VQA)/ClipMatch@1Visual Question Answering (VQA)/ClipMatch@5Visual Question Answering (VQA)/ContainsVisual Question Answering (VQA)/ExactMatchVisual Question Answering (VQA)/Follow-up ClipMatch@1Visual Question Answering (VQA)/Follow-up ClipMatch@5Visual Question Answering (VQA)/Follow-up ContainsVisual Question Answering (VQA)/Follow-up ExactMatchZero-Shot Action Recognition/Top-1 AccuracyZero-Shot Video Retrieval/text-to-video R@1Zero-Shot Video Retrieval/text-to-video R@5Zero-Shot Video Retrieval/text-to-video R@10Zero-Shot Video Retrieval/video-to-text R@1Zero-Shot Video Retrieval/video-to-text R@5Zero-Shot Video Retrieval/video-to-text R@10

Related Benchmarks

ActivityNet Adverbs/Video/Acc-AActivityNet Adverbs/Video/mAP MActivityNet Adverbs/Video/mAP WActivityNet Adverbs/Video Retrieval/Acc-AActivityNet Adverbs/Video Retrieval/mAP MActivityNet Adverbs/Video Retrieval/mAP WActivityNet Adverbs/Video-Adverb Retrieval/Acc-AActivityNet Adverbs/Video-Adverb Retrieval/mAP MActivityNet Adverbs/Video-Adverb Retrieval/mAP WActivityNet Captions/10-shot image generation/Recall@SumActivityNet Captions/Action Localization/Average F1ActivityNet Captions/Action Localization/Average PrecisionActivityNet Captions/Action Localization/Average RecallActivityNet Captions/Dense Captioning/Live ScoreActivityNet Captions/Dense Video Captioning/BLEU-3ActivityNet Captions/Dense Video Captioning/BLEU-4ActivityNet Captions/Dense Video Captioning/BLEU4ActivityNet Captions/Dense Video Captioning/CIDErActivityNet Captions/Dense Video Captioning/DIV-1ActivityNet Captions/Dense Video Captioning/DIV-2ActivityNet Captions/Dense Video Captioning/F1ActivityNet Captions/Dense Video Captioning/METEORActivityNet Captions/Dense Video Captioning/PrecisionActivityNet Captions/Dense Video Captioning/RE-4ActivityNet Captions/Dense Video Captioning/RecallActivityNet Captions/Dense Video Captioning/SODAActivityNet Captions/Temporal Action Localization/Average F1ActivityNet Captions/Temporal Action Localization/Average PrecisionActivityNet Captions/Temporal Action Localization/Average RecallActivityNet Captions/Text to Video Retrieval/Recall@SumActivityNet Captions/Video/Average F1ActivityNet Captions/Video/Average PrecisionActivityNet Captions/Video/Average RecallActivityNet Captions/Video/R@1,IoU=0.5ActivityNet Captions/Video/R@1,IoU=0.7ActivityNet Captions/Video/R@5,IoU=0.5ActivityNet Captions/Video/R@5,IoU=0.7ActivityNet Captions/Video Captioning/BLEU-3ActivityNet Captions/Video Captioning/BLEU-4ActivityNet Captions/Video Captioning/BLEU4ActivityNet Captions/Video Captioning/CIDErActivityNet Captions/Video Captioning/DIV-1ActivityNet Captions/Video Captioning/DIV-2ActivityNet Captions/Video Captioning/F1ActivityNet Captions/Video Captioning/Live ScoreActivityNet Captions/Video Captioning/METEORActivityNet Captions/Video Captioning/PrecisionActivityNet Captions/Video Captioning/RE-4ActivityNet Captions/Video Captioning/ROUGE-LActivityNet Captions/Video Captioning/RecallActivityNet Captions/Video Captioning/SODAActivityNet Captions/Zero-Shot Learning/Average F1ActivityNet Captions/Zero-Shot Learning/Average PrecisionActivityNet Captions/Zero-Shot Learning/Average RecallActivityNet-1.2/Action Localization/Mean mAPActivityNet-1.2/Action Localization/mAP IOU@0.1ActivityNet-1.2/Action Localization/mAP IOU@0.3ActivityNet-1.2/Action Localization/mAP IOU@0.5ActivityNet-1.2/Action Localization/mAP IOU@0.7ActivityNet-1.2/Action Localization/mAP@0.5ActivityNet-1.2/Temporal Action Localization/Mean mAPActivityNet-1.2/Temporal Action Localization/mAP IOU@0.1ActivityNet-1.2/Temporal Action Localization/mAP IOU@0.3ActivityNet-1.2/Temporal Action Localization/mAP IOU@0.5ActivityNet-1.2/Temporal Action Localization/mAP IOU@0.7ActivityNet-1.2/Temporal Action Localization/mAP@0.5ActivityNet-1.2/Video/Mean mAPActivityNet-1.2/Video/mAPActivityNet-1.2/Video/mAP IOU@0.1ActivityNet-1.2/Video/mAP IOU@0.3ActivityNet-1.2/Video/mAP IOU@0.5ActivityNet-1.2/Video/mAP IOU@0.7ActivityNet-1.2/Video/mAP@0.5ActivityNet-1.2/Weakly Supervised Action Localization/Mean mAPActivityNet-1.2/Weakly Supervised Action Localization/mAP@0.5ActivityNet-1.2/Zero-Shot Learning/Mean mAPActivityNet-1.2/Zero-Shot Learning/mAP IOU@0.1ActivityNet-1.2/Zero-Shot Learning/mAP IOU@0.3ActivityNet-1.2/Zero-Shot Learning/mAP IOU@0.5ActivityNet-1.2/Zero-Shot Learning/mAP IOU@0.7ActivityNet-1.2/Zero-Shot Learning/mAP@0.5ActivityNet-1.3/Action Localization/AR@100ActivityNet-1.3/Action Localization/AUC (test)ActivityNet-1.3/Action Localization/AUC (val)ActivityNet-1.3/Action Localization/mAPActivityNet-1.3/Action Localization/mAP IOU@0.5ActivityNet-1.3/Action Localization/mAP IOU@0.75ActivityNet-1.3/Action Localization/mAP IOU@0.95ActivityNet-1.3/Action Localization/mAP@0.5ActivityNet-1.3/Action Localization/mAP@0.5:0.95ActivityNet-1.3/Temporal Action Localization/AR@100ActivityNet-1.3/Temporal Action Localization/AUC (test)ActivityNet-1.3/Temporal Action Localization/AUC (val)ActivityNet-1.3/Temporal Action Localization/mAPActivityNet-1.3/Temporal Action Localization/mAP IOU@0.5ActivityNet-1.3/Temporal Action Localization/mAP IOU@0.75ActivityNet-1.3/Temporal Action Localization/mAP IOU@0.95ActivityNet-1.3/Temporal Action Localization/mAP@0.5ActivityNet-1.3/Temporal Action Localization/mAP@0.5:0.95ActivityNet-1.3/Video/AR@100ActivityNet-1.3/Video/AUC (test)ActivityNet-1.3/Video/AUC (val)ActivityNet-1.3/Video/mAPActivityNet-1.3/Video/mAP IOU@0.5ActivityNet-1.3/Video/mAP IOU@0.75ActivityNet-1.3/Video/mAP IOU@0.95ActivityNet-1.3/Video/mAP@0.5ActivityNet-1.3/Video/mAP@0.5:0.95ActivityNet-1.3/Weakly Supervised Action Localization/mAP@0.5ActivityNet-1.3/Weakly Supervised Action Localization/mAP@0.5:0.95ActivityNet-1.3/Weakly-supervised Temporal Action Localization/mAPActivityNet-1.3/Weakly-supervised Temporal Action Localization/mAP IOU@0.5ActivityNet-1.3/Weakly-supervised Temporal Action Localization/mAP IOU@0.75ActivityNet-1.3/Weakly-supervised Temporal Action Localization/mAP IOU@0.95ActivityNet-1.3/Zero-Shot Learning/AR@100ActivityNet-1.3/Zero-Shot Learning/AUC (test)ActivityNet-1.3/Zero-Shot Learning/AUC (val)ActivityNet-1.3/Zero-Shot Learning/mAPActivityNet-1.3/Zero-Shot Learning/mAP IOU@0.5ActivityNet-1.3/Zero-Shot Learning/mAP IOU@0.75ActivityNet-1.3/Zero-Shot Learning/mAP IOU@0.95ActivityNet-1.3/Zero-Shot Learning/mAP@0.5ActivityNet-1.3/Zero-Shot Learning/mAP@0.5:0.95ActivityNet-GZSL (cls)/Zero-Shot Learning/HMActivityNet-GZSL (cls)/Zero-Shot Learning/ZSLActivityNet-GZSL(main)/Zero-Shot Learning/HMActivityNet-GZSL(main)/Zero-Shot Learning/ZSLActivityNet-QA/Question Answering/AccuracyActivityNet-QA/Question Answering/Confidence ScoreActivityNet-QA/Video Question Answering/AccuracyActivityNet-QA/Video Question Answering/Confidence ScoreActivityNet-QA/Video Question Answering/Confidence score

Statistics

Papers
807
Benchmarks
45

Links

Homepage

Tasks

Action ClassificationAction DetectionAction RecognitionAction Recognition In VideosActivity RecognitionFew Shot Temporal Action LocalizationGZSL Video ClassificationSemi-Supervised Action DetectionTemporal Action LocalizationTemporal Action Proposal GenerationVideoVideo RetrievalVisual Question Answering (VQA)Weakly Supervised Action LocalizationWeakly-supervised Temporal Action LocalizationZSL Video ClassificationZero-Shot Action DetectionZero-Shot Action RecognitionZero-Shot Video Retrieval