TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/BasicTAD: an Astounding RGB-Only Baseline for Temporal Act...

BasicTAD: an Astounding RGB-Only Baseline for Temporal Action Detection

Min Yang, Guo Chen, Yin-Dong Zheng, Tong Lu, LiMin Wang

2022-05-05Action DetectionVideo UnderstandingTemporal Action Localizationobject-detectionObject Detection
PaperPDFCodeCode(official)

Abstract

Temporal action detection (TAD) is extensively studied in the video understanding community by generally following the object detection pipeline in images. However, complex designs are not uncommon in TAD, such as two-stream feature extraction, multi-stage training, complex temporal modeling, and global context fusion. In this paper, we do not aim to introduce any novel technique for TAD. Instead, we study a simple, straightforward, yet must-known baseline given the current status of complex design and low detection efficiency in TAD. In our simple baseline (termed BasicTAD), we decompose the TAD pipeline into several essential components: data sampling, backbone design, neck construction, and detection head. We extensively investigate the existing techniques in each component for this baseline, and more importantly, perform end-to-end training over the entire pipeline thanks to the simplicity of design. As a result, this simple BasicTAD yields an astounding and real-time RGB-Only baseline very close to the state-of-the-art methods with two-stream inputs. In addition, we further improve the BasicTAD by preserving more temporal and spatial information in network representation (termed as PlusTAD). Empirical results demonstrate that our PlusTAD is very efficient and significantly outperforms the previous methods on the datasets of THUMOS14 and FineAction. Meanwhile, we also perform in-depth visualization and error analysis on our proposed method and try to provide more insights on the TAD problem. Our approach can serve as a strong baseline for future TAD research. The code and model will be released at https://github.com/MCG-NJU/BasicTAD.

Results

TaskDatasetMetricValueModel
VideoTHUMOS’14Avg mAP (0.3:0.7)59.6BasicTAD (160,6,192,R50-SlowOnly)
VideoTHUMOS’14mAP IOU@0.375.5BasicTAD (160,6,192,R50-SlowOnly)
VideoTHUMOS’14mAP IOU@0.470.8BasicTAD (160,6,192,R50-SlowOnly)
VideoTHUMOS’14mAP IOU@0.563.5BasicTAD (160,6,192,R50-SlowOnly)
VideoTHUMOS’14mAP IOU@0.650.9BasicTAD (160,6,192,R50-SlowOnly)
VideoTHUMOS’14mAP IOU@0.737.4BasicTAD (160,6,192,R50-SlowOnly)
VideoTHUMOS’14Avg mAP (0.3:0.7)54.9BasicTAD (112,3,96,R50-SlowOnly)
VideoTHUMOS’14mAP IOU@0.368.4BasicTAD (112,3,96,R50-SlowOnly)
VideoTHUMOS’14mAP IOU@0.465BasicTAD (112,3,96,R50-SlowOnly)
VideoTHUMOS’14mAP IOU@0.558.6BasicTAD (112,3,96,R50-SlowOnly)
VideoTHUMOS’14mAP IOU@0.649.2BasicTAD (112,3,96,R50-SlowOnly)
VideoTHUMOS’14mAP IOU@0.733.5BasicTAD (112,3,96,R50-SlowOnly)
VideoTHUMOS14Avg mAP (0.3:0.7)59.6BasicTAD (R50-SlowOnly)
Temporal Action LocalizationTHUMOS’14Avg mAP (0.3:0.7)59.6BasicTAD (160,6,192,R50-SlowOnly)
Temporal Action LocalizationTHUMOS’14mAP IOU@0.375.5BasicTAD (160,6,192,R50-SlowOnly)
Temporal Action LocalizationTHUMOS’14mAP IOU@0.470.8BasicTAD (160,6,192,R50-SlowOnly)
Temporal Action LocalizationTHUMOS’14mAP IOU@0.563.5BasicTAD (160,6,192,R50-SlowOnly)
Temporal Action LocalizationTHUMOS’14mAP IOU@0.650.9BasicTAD (160,6,192,R50-SlowOnly)
Temporal Action LocalizationTHUMOS’14mAP IOU@0.737.4BasicTAD (160,6,192,R50-SlowOnly)
Temporal Action LocalizationTHUMOS’14Avg mAP (0.3:0.7)54.9BasicTAD (112,3,96,R50-SlowOnly)
Temporal Action LocalizationTHUMOS’14mAP IOU@0.368.4BasicTAD (112,3,96,R50-SlowOnly)
Temporal Action LocalizationTHUMOS’14mAP IOU@0.465BasicTAD (112,3,96,R50-SlowOnly)
Temporal Action LocalizationTHUMOS’14mAP IOU@0.558.6BasicTAD (112,3,96,R50-SlowOnly)
Temporal Action LocalizationTHUMOS’14mAP IOU@0.649.2BasicTAD (112,3,96,R50-SlowOnly)
Temporal Action LocalizationTHUMOS’14mAP IOU@0.733.5BasicTAD (112,3,96,R50-SlowOnly)
Temporal Action LocalizationTHUMOS14Avg mAP (0.3:0.7)59.6BasicTAD (R50-SlowOnly)
Zero-Shot LearningTHUMOS’14Avg mAP (0.3:0.7)59.6BasicTAD (160,6,192,R50-SlowOnly)
Zero-Shot LearningTHUMOS’14mAP IOU@0.375.5BasicTAD (160,6,192,R50-SlowOnly)
Zero-Shot LearningTHUMOS’14mAP IOU@0.470.8BasicTAD (160,6,192,R50-SlowOnly)
Zero-Shot LearningTHUMOS’14mAP IOU@0.563.5BasicTAD (160,6,192,R50-SlowOnly)
Zero-Shot LearningTHUMOS’14mAP IOU@0.650.9BasicTAD (160,6,192,R50-SlowOnly)
Zero-Shot LearningTHUMOS’14mAP IOU@0.737.4BasicTAD (160,6,192,R50-SlowOnly)
Zero-Shot LearningTHUMOS’14Avg mAP (0.3:0.7)54.9BasicTAD (112,3,96,R50-SlowOnly)
Zero-Shot LearningTHUMOS’14mAP IOU@0.368.4BasicTAD (112,3,96,R50-SlowOnly)
Zero-Shot LearningTHUMOS’14mAP IOU@0.465BasicTAD (112,3,96,R50-SlowOnly)
Zero-Shot LearningTHUMOS’14mAP IOU@0.558.6BasicTAD (112,3,96,R50-SlowOnly)
Zero-Shot LearningTHUMOS’14mAP IOU@0.649.2BasicTAD (112,3,96,R50-SlowOnly)
Zero-Shot LearningTHUMOS’14mAP IOU@0.733.5BasicTAD (112,3,96,R50-SlowOnly)
Zero-Shot LearningTHUMOS14Avg mAP (0.3:0.7)59.6BasicTAD (R50-SlowOnly)
Action LocalizationTHUMOS’14Avg mAP (0.3:0.7)59.6BasicTAD (160,6,192,R50-SlowOnly)
Action LocalizationTHUMOS’14mAP IOU@0.375.5BasicTAD (160,6,192,R50-SlowOnly)
Action LocalizationTHUMOS’14mAP IOU@0.470.8BasicTAD (160,6,192,R50-SlowOnly)
Action LocalizationTHUMOS’14mAP IOU@0.563.5BasicTAD (160,6,192,R50-SlowOnly)
Action LocalizationTHUMOS’14mAP IOU@0.650.9BasicTAD (160,6,192,R50-SlowOnly)
Action LocalizationTHUMOS’14mAP IOU@0.737.4BasicTAD (160,6,192,R50-SlowOnly)
Action LocalizationTHUMOS’14Avg mAP (0.3:0.7)54.9BasicTAD (112,3,96,R50-SlowOnly)
Action LocalizationTHUMOS’14mAP IOU@0.368.4BasicTAD (112,3,96,R50-SlowOnly)
Action LocalizationTHUMOS’14mAP IOU@0.465BasicTAD (112,3,96,R50-SlowOnly)
Action LocalizationTHUMOS’14mAP IOU@0.558.6BasicTAD (112,3,96,R50-SlowOnly)
Action LocalizationTHUMOS’14mAP IOU@0.649.2BasicTAD (112,3,96,R50-SlowOnly)
Action LocalizationTHUMOS’14mAP IOU@0.733.5BasicTAD (112,3,96,R50-SlowOnly)
Action LocalizationTHUMOS14Avg mAP (0.3:0.7)59.6BasicTAD (R50-SlowOnly)

Related Papers

VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding2025-07-17A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images2025-07-17Decoupled PROB: Decoupled Query Initialization Tasks and Objectness-Class Learning for Open World Object Detection2025-07-17Dual LiDAR-Based Traffic Movement Count Estimation at a Signalized Intersection: Deployment, Data Collection, and Preliminary Analysis2025-07-17DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16UGC-VideoCaptioner: An Omni UGC Video Detail Caption Model and New Benchmarks2025-07-15