Finding Action Tubes with a Sparse-to-Dense Framework

Yuxi Li, Weiyao Lin, Tao Wang, John See, Rui Qian, Ning Xu, Li-Min Wang, Shugong Xu

2020-08-30Action Detection

Abstract

The task of spatial-temporal action detection has attracted increasing attention among researchers. Existing dominant methods solve this problem by relying on short-term information and dense serial-wise detection on each individual frames or clips. Despite their effectiveness, these methods showed inadequate use of long-term information and are prone to inefficiency. In this paper, we propose for the first time, an efficient framework that generates action tube proposals from video streams with a single forward pass in a sparse-to-dense manner. There are two key characteristics in this framework: (1) Both long-term and short-term sampled information are explicitly utilized in our spatiotemporal network, (2) A new dynamic feature sampling module (DTS) is designed to effectively approximate the tube output while keeping the system tractable. We evaluate the efficacy of our model on the UCF101-24, JHMDB-21 and UCFSports benchmark datasets, achieving promising results that are competitive to state-of-the-art methods. The proposed sparse-to-dense strategy rendered our framework about 7.6 times more efficient than the nearest competitor.

Results

Task	Dataset	Metric	Value	Model
Action Detection	UCF101-24	Video-mAP 0.5	54	DTS
Action Detection	UCF Sports	Video-mAP 0.2	94.3	DTS
Action Detection	UCF Sports	Video-mAP 0.5	93.8	DTS
Action Detection	J-HMDB	Video-mAP 0.2	76.1	DTS
Action Detection	J-HMDB	Video-mAP 0.5	74.3	DTS

Related Papers

CBF-AFA: Chunk-Based Multi-SSL Fusion for Automatic Fluency Assessment2025-06-25 MultiHuman-Testbench: Benchmarking Image Generation for Multiple Humans2025-06-25 Distributed Activity Detection for Cell-Free Hybrid Near-Far Field Communications2025-06-17 Speaker Diarization with Overlapping Community Detection Using Graph Attention Networks and Label Propagation Algorithm2025-06-03 Attention Is Not Always the Answer: Optimizing Voice Activity Detection with Simple Feature Fusion2025-06-02 Joint Activity Detection and Channel Estimation for Massive Connectivity: Where Message Passing Meets Score-Based Generative Priors2025-05-31 Towards Robust Overlapping Speech Detection: A Speaker-Aware Progressive Approach Using WavLM2025-05-29 Robust Activity Detection for Massive Random Access2025-05-21