TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/HACS: Human Action Clips and Segments Dataset for Recognit...

HACS: Human Action Clips and Segments Dataset for Recognition and Temporal Localization

Hang Zhao, Antonio Torralba, Lorenzo Torresani, Zhicheng Yan

2017-12-26ICCV 2019 10Action ClassificationAction LocalizationTransfer LearningTemporal LocalizationAction RecognitionTemporal Action Localization
PaperPDFCodeCode

Abstract

This paper presents a new large-scale dataset for recognition and temporal localization of human actions collected from Web videos. We refer to it as HACS (Human Action Clips and Segments). We leverage both consensus and disagreement among visual classifiers to automatically mine candidate short clips from unlabeled videos, which are subsequently validated by human annotators. The resulting dataset is dubbed HACS Clips. Through a separate process we also collect annotations defining action segment boundaries. This resulting dataset is called HACS Segments. Overall, HACS Clips consists of 1.5M annotated clips sampled from 504K untrimmed videos, and HACS Seg-ments contains 139K action segments densely annotatedin 50K untrimmed videos spanning 200 action categories. HACS Clips contains more labeled examples than any existing video benchmark. This renders our dataset both a large scale action recognition benchmark and an excellent source for spatiotemporal feature learning. In our transferlearning experiments on three target datasets, HACS Clips outperforms Kinetics-600, Moments-In-Time and Sports1Mas a pretraining source. On HACS Segments, we evaluate state-of-the-art methods of action proposal generation and action localization, and highlight the new challenges posed by our dense temporal annotations.

Results

TaskDatasetMetricValueModel
VideoHACSAverage-mAP18.97SSN
VideoHACSmAP@0.528.82SSN
VideoHACSmAP@0.7518.8SSN
VideoHACSmAP@0.955.32SSN
Temporal Action LocalizationHACSAverage-mAP18.97SSN
Temporal Action LocalizationHACSmAP@0.528.82SSN
Temporal Action LocalizationHACSmAP@0.7518.8SSN
Temporal Action LocalizationHACSmAP@0.955.32SSN
Zero-Shot LearningHACSAverage-mAP18.97SSN
Zero-Shot LearningHACSmAP@0.528.82SSN
Zero-Shot LearningHACSmAP@0.7518.8SSN
Zero-Shot LearningHACSmAP@0.955.32SSN
Action LocalizationHACSAverage-mAP18.97SSN
Action LocalizationHACSmAP@0.528.82SSN
Action LocalizationHACSmAP@0.7518.8SSN
Action LocalizationHACSmAP@0.955.32SSN

Related Papers

RaMen: Multi-Strategy Multi-Modal Learning for Bundle Construction2025-07-18Disentangling coincident cell events using deep transfer learning and compressive sensing2025-07-17A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17Best Practices for Large-Scale, Pixel-Wise Crop Mapping and Transfer Learning Workflows2025-07-16DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16Robust-Multi-Task Gradient Boosting2025-07-15Calibrated and Robust Foundation Models for Vision-Language and Medical Image Tasks Under Distribution Shift2025-07-12The Bayesian Approach to Continual Learning: An Overview2025-07-11