TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets/Kinetics

Kinetics

Kinetics Human Action Video Dataset

VideosCC BY 4.0Introduced 2017-05-19

The Kinetics dataset is a large-scale, high-quality dataset for human action recognition in videos. The dataset consists of around 500,000 video clips covering 600 human action classes with at least 600 video clips for each action class. Each video clip lasts around 10 seconds and is labeled with a single action class. The videos are collected from YouTube.

Source: Self-supervised Visual Feature Learning with Deep Neural Networks: A Survey

Benchmarks

Text-to-Video Generation/AccuracyVideo/Top-1Video Classification/Top-1Visual Tracking/Average JaccardZero-Shot Action Recognition/Top-1 AccuracyZero-Shot Action Recognition/Top-5 Accuracy

Related Benchmarks

Kinetics-100/Action Recognition/AccuracyKinetics-100/Activity Recognition/AccuracyKinetics-400/3D Action Recognition/Actions Top-1 (S1)Kinetics-400/Action Detection/Actions Top-1 (S1)Kinetics-400/Action Localization/Actions Top-1 (S1)Kinetics-400/Action Recognition/Actions Top-1 (S1)Kinetics-400/Action Recognition/Top-1 AccuracyKinetics-400/Action Recognition/Top-1 accuracy %Kinetics-400/Action Recognition/Top-5 AccuracyKinetics-400/Action Recognition/Top-5 Accuracy %Kinetics-400/Action Recognition In Videos/Top-1 AccuracyKinetics-400/Action Recognition In Videos/Top-5 AccuracyKinetics-400/Activity Recognition/Actions Top-1 (S1)Kinetics-400/Activity Recognition/Top-1 AccuracyKinetics-400/Activity Recognition/Top-1 accuracy %Kinetics-400/Activity Recognition/Top-5 AccuracyKinetics-400/Activity Recognition/Top-5 Accuracy %Kinetics-400/Boundary Detection/Pairwise F1Kinetics-400/Boundary Detection/PrecisionKinetics-400/Boundary Detection/RecallKinetics-400/Event Segmentation/F1Kinetics-400/Temporal Action Localization/Actions Top-1 (S1)Kinetics-400/Video/Acc@1Kinetics-400/Video/Acc@5Kinetics-400/Video/Actions Top-1 (S1)Kinetics-400/Video/Clip acc@1Kinetics-400/Video/Clip acc@5Kinetics-400/Video/FLOPs (G) x viewsKinetics-400/Video/Parameters (M)Kinetics-400/Zero-Shot Learning/Actions Top-1 (S1)Kinetics-600/Action Recognition/Top-1 AccuracyKinetics-600/Action Recognition/Top-5 AccuracyKinetics-600/Action Recognition In Videos/Top-1 AccuracyKinetics-600/Action Recognition In Videos/Top-5 AccuracyKinetics-600/Activity Recognition/Top-1 AccuracyKinetics-600/Activity Recognition/Top-5 AccuracyKinetics-600/Video/GFLOPsKinetics-600/Video/Top-1 AccuracyKinetics-600/Video/Top-5 AccuracyKinetics-600 12 frames, 128x128/Video/FIDKinetics-600 12 frames, 128x128/Video Generation/FIDKinetics-600 12 frames, 64x64/Video/CondKinetics-600 12 frames, 64x64/Video/FVDKinetics-600 12 frames, 64x64/Video/ISKinetics-600 12 frames, 64x64/Video/PredKinetics-600 12 frames, 64x64/Video Generation/FVDKinetics-600 12 frames, 64x64/Video Prediction/CondKinetics-600 12 frames, 64x64/Video Prediction/FVDKinetics-600 12 frames, 64x64/Video Prediction/ISKinetics-600 12 frames, 64x64/Video Prediction/PredKinetics-600 48 frames, 64x64/Video/FIDKinetics-600 48 frames, 64x64/Video/Inception ScoreKinetics-600 48 frames, 64x64/Video Generation/FIDKinetics-600 48 frames, 64x64/Video Generation/Inception ScoreKinetics-700/Image Clustering/AccuracyKinetics-700/Video/FIDKinetics-700/Video/FVDKinetics-700/Video/Top-1 AccuracyKinetics-700/Video/Top-5 AccuracyKinetics-700/Video Generation/FIDKinetics-700/Video Generation/FVDKinetics-700-2020/Video/Top 1 AccuracyKinetics-GEB+/10-shot image generation/mAPKinetics-GEB+/10-shot image generation/text-to-video R@1Kinetics-GEB+/10-shot image generation/text-to-video R@10Kinetics-GEB+/10-shot image generation/text-to-video R@5Kinetics-GEB+/10-shot image generation/text-to-video R@50Kinetics-GEB+/Text to Video Retrieval/mAPKinetics-GEB+/Text to Video Retrieval/text-to-video R@1Kinetics-GEB+/Text to Video Retrieval/text-to-video R@10Kinetics-GEB+/Text to Video Retrieval/text-to-video R@5Kinetics-GEB+/Text to Video Retrieval/text-to-video R@50Kinetics-GEB+/Video/F1@0.1sKinetics-GEB+/Video/F1@0.2sKinetics-GEB+/Video/F1@0.5sKinetics-GEB+/Video/F1@1.0sKinetics-GEB+/Video/F1@1.5sKinetics-GEB+/Video/F1@2.0sKinetics-GEB+/Video/F1@2.5sKinetics-GEB+/Video/F1@3.0sKinetics-GEB+/Video/F1@AvgKinetics-GEB+/Video Captioning/CIDErKinetics-GEB+/Video Captioning/ROUGE-LKinetics-GEB+/Video Captioning/SPICEKinetics-GEB+/Video Grounding/F1@0.1sKinetics-GEB+/Video Grounding/F1@0.2sKinetics-GEB+/Video Grounding/F1@0.5sKinetics-GEB+/Video Grounding/F1@1.0sKinetics-GEB+/Video Grounding/F1@1.5sKinetics-GEB+/Video Grounding/F1@2.0sKinetics-GEB+/Video Grounding/F1@2.5sKinetics-GEB+/Video Grounding/F1@3.0sKinetics-GEB+/Video Grounding/F1@AvgKinetics-GEB+/Video Retrieval/F1@0.1sKinetics-GEB+/Video Retrieval/F1@0.2sKinetics-GEB+/Video Retrieval/F1@0.5sKinetics-GEB+/Video Retrieval/F1@1.0sKinetics-GEB+/Video Retrieval/F1@1.5sKinetics-GEB+/Video Retrieval/F1@2.0sKinetics-GEB+/Video Retrieval/F1@2.5sKinetics-GEB+/Video Retrieval/F1@3.0sKinetics-GEB+/Video Retrieval/F1@AvgKinetics-GEBD/Event Segmentation/F1 @ RelDis. 0.05Kinetics-Skeleton dataset/3D Action Recognition/AccuracyKinetics-Skeleton dataset/3D Action Recognition/GFLOPS per predictionKinetics-Skeleton dataset/Action Detection/AccuracyKinetics-Skeleton dataset/Action Detection/GFLOPS per predictionKinetics-Skeleton dataset/Action Localization/AccuracyKinetics-Skeleton dataset/Action Localization/GFLOPS per predictionKinetics-Skeleton dataset/Action Recognition/AccuracyKinetics-Skeleton dataset/Action Recognition/GFLOPS per predictionKinetics-Skeleton dataset/Activity Recognition/AccuracyKinetics-Skeleton dataset/Activity Recognition/GFLOPS per predictionKinetics-Skeleton dataset/Temporal Action Localization/AccuracyKinetics-Skeleton dataset/Temporal Action Localization/GFLOPS per predictionKinetics-Skeleton dataset/Video/AccuracyKinetics-Skeleton dataset/Video/GFLOPS per predictionKinetics-Skeleton dataset/Zero-Shot Learning/AccuracyKinetics-Skeleton dataset/Zero-Shot Learning/GFLOPS per predictionKinetics-Sounds/Video/Top 1 AccuracyKinetics-Sounds/Video/Top 5 Accuracy

Statistics

Papers
1,341
Benchmarks
6

Links

Homepage

Tasks

Action ClassificationAction RecognitionAction Recognition In VideosBoundary CaptioningBoundary DetectionBoundary GroundingEvent SegmentationFew Shot Action RecognitionGeneric Event Boundary DetectionImage ClusteringLong-tail LearningSelf-Supervised Action RecognitionSelf-Supervised Action Recognition LinearSemantic Object Interaction ClassificationSkeleton Based Action RecognitionSpatio-Temporal Action LocalizationTemporal Action LocalizationText to Video RetrievalText-to-Video GenerationVideoVideo CaptioningVideo ClassificationVideo GenerationVideo GroundingVideo PredictionVideo RecognitionVideo RetrievalVideo UnderstandingVisual TrackingZero-Shot Action Recognitionimbalanced classification