UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild

Khurram Soomro, Amir Roshan Zamir, Mubarak Shah

2012-12-03Skeleton Based Action Recognition Action Recognition Action Recognition In Videos Temporal Action Localization

Paper PDF Code Code Code Code Code Code Code

Abstract

We introduce UCF101 which is currently the largest dataset of human actions. It consists of 101 action classes, over 13k clips and 27 hours of video data. The database consists of realistic user uploaded videos containing camera motion and cluttered background. Additionally, we provide baseline action recognition results on this new dataset using standard bag of words approach with overall performance of 44.5%. To the best of our knowledge, UCF101 is currently the most challenging dataset of actions due to its large number of classes, large number of clips and also unconstrained nature of such clips.

Results

Task	Dataset	Metric	Value	Model
Activity Recognition	UCF101	3-fold Accuracy	43.9	Baseline UCF101
Action Recognition	UCF101	3-fold Accuracy	43.9	Baseline UCF101
Action Recognition In Videos	UCF101	3-fold Accuracy	43.9	Baseline UCF101

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17 DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16 Zero-shot Skeleton-based Action Recognition with Prototype-guided Feature Alignment2025-07-01 EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception2025-06-26 Feature Hallucination for Self-supervised Action Recognition2025-06-25 CARMA: Context-Aware Situational Grounding of Human-Robot Group Interactions by Combining Vision-Language Models with Object and Action Recognition2025-06-25 Including Semantic Information via Word Embeddings for Skeleton-based Action Recognition2025-06-23 Adapting Vision-Language Models for Evaluating World Models2025-06-22