TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/TadML: A fast temporal action detection with Mechanics-MLP

TadML: A fast temporal action detection with Mechanics-MLP

Bowen Deng, Dongchang Liu

2022-06-07Action DetectionOptical Flow EstimationTemporal LocalizationTemporal Action Localization
PaperPDFCode(official)

Abstract

Temporal Action Detection(TAD) is a crucial but challenging task in video understanding.It is aimed at detecting both the type and start-end frame for each action instance in a long, untrimmed video.Most current models adopt both RGB and Optical-Flow streams for the TAD task. Thus, original RGB frames must be converted manually into Optical-Flow frames with additional computation and time cost, which is an obstacle to achieve real-time processing. At present, many models adopt two-stage strategies, which would slow the inference speed down and complicatedly tuning on proposals generating.By comparison, we propose a one-stage anchor-free temporal localization method with RGB stream only, in which a novel Newtonian Mechanics-MLP architecture is established. It has comparable accuracy with all existing state-of-the-art models, while surpasses the inference speed of these methods by a large margin. The typical inference speed in this paper is astounding 4.44 video per second on THUMOS14. In applications, because there is no need to convert optical flow, the inference speed will be faster.It also proves that MLP has great potential in downstream tasks such as TAD. The source code is available at https://github.com/BonedDeng/TadML

Results

TaskDatasetMetricValueModel
VideoTHUMOS’14Avg mAP (0.3:0.7)59.7TadML(two-stream)
VideoTHUMOS’14mAP IOU@0.373.29TadML(two-stream)
VideoTHUMOS’14mAP IOU@0.469.73TadML(two-stream)
VideoTHUMOS’14mAP IOU@0.562.53TadML(two-stream)
VideoTHUMOS’14mAP IOU@0.653.36TadML(two-stream)
VideoTHUMOS’14mAP IOU@0.739.6TadML(two-stream)
VideoTHUMOS’14Avg mAP (0.3:0.7)53.46TadML(rgb-only)
VideoTHUMOS’14mAP IOU@0.368.78TadML(rgb-only)
VideoTHUMOS’14mAP IOU@0.464.66TadML(rgb-only)
VideoTHUMOS’14mAP IOU@0.556.61TadML(rgb-only)
VideoTHUMOS’14mAP IOU@0.645.4TadML(rgb-only)
VideoTHUMOS’14mAP IOU@0.731.88TadML(rgb-only)
Temporal Action LocalizationTHUMOS’14Avg mAP (0.3:0.7)59.7TadML(two-stream)
Temporal Action LocalizationTHUMOS’14mAP IOU@0.373.29TadML(two-stream)
Temporal Action LocalizationTHUMOS’14mAP IOU@0.469.73TadML(two-stream)
Temporal Action LocalizationTHUMOS’14mAP IOU@0.562.53TadML(two-stream)
Temporal Action LocalizationTHUMOS’14mAP IOU@0.653.36TadML(two-stream)
Temporal Action LocalizationTHUMOS’14mAP IOU@0.739.6TadML(two-stream)
Temporal Action LocalizationTHUMOS’14Avg mAP (0.3:0.7)53.46TadML(rgb-only)
Temporal Action LocalizationTHUMOS’14mAP IOU@0.368.78TadML(rgb-only)
Temporal Action LocalizationTHUMOS’14mAP IOU@0.464.66TadML(rgb-only)
Temporal Action LocalizationTHUMOS’14mAP IOU@0.556.61TadML(rgb-only)
Temporal Action LocalizationTHUMOS’14mAP IOU@0.645.4TadML(rgb-only)
Temporal Action LocalizationTHUMOS’14mAP IOU@0.731.88TadML(rgb-only)
Zero-Shot LearningTHUMOS’14Avg mAP (0.3:0.7)59.7TadML(two-stream)
Zero-Shot LearningTHUMOS’14mAP IOU@0.373.29TadML(two-stream)
Zero-Shot LearningTHUMOS’14mAP IOU@0.469.73TadML(two-stream)
Zero-Shot LearningTHUMOS’14mAP IOU@0.562.53TadML(two-stream)
Zero-Shot LearningTHUMOS’14mAP IOU@0.653.36TadML(two-stream)
Zero-Shot LearningTHUMOS’14mAP IOU@0.739.6TadML(two-stream)
Zero-Shot LearningTHUMOS’14Avg mAP (0.3:0.7)53.46TadML(rgb-only)
Zero-Shot LearningTHUMOS’14mAP IOU@0.368.78TadML(rgb-only)
Zero-Shot LearningTHUMOS’14mAP IOU@0.464.66TadML(rgb-only)
Zero-Shot LearningTHUMOS’14mAP IOU@0.556.61TadML(rgb-only)
Zero-Shot LearningTHUMOS’14mAP IOU@0.645.4TadML(rgb-only)
Zero-Shot LearningTHUMOS’14mAP IOU@0.731.88TadML(rgb-only)
Action LocalizationTHUMOS’14Avg mAP (0.3:0.7)59.7TadML(two-stream)
Action LocalizationTHUMOS’14mAP IOU@0.373.29TadML(two-stream)
Action LocalizationTHUMOS’14mAP IOU@0.469.73TadML(two-stream)
Action LocalizationTHUMOS’14mAP IOU@0.562.53TadML(two-stream)
Action LocalizationTHUMOS’14mAP IOU@0.653.36TadML(two-stream)
Action LocalizationTHUMOS’14mAP IOU@0.739.6TadML(two-stream)
Action LocalizationTHUMOS’14Avg mAP (0.3:0.7)53.46TadML(rgb-only)
Action LocalizationTHUMOS’14mAP IOU@0.368.78TadML(rgb-only)
Action LocalizationTHUMOS’14mAP IOU@0.464.66TadML(rgb-only)
Action LocalizationTHUMOS’14mAP IOU@0.556.61TadML(rgb-only)
Action LocalizationTHUMOS’14mAP IOU@0.645.4TadML(rgb-only)
Action LocalizationTHUMOS’14mAP IOU@0.731.88TadML(rgb-only)
Action DetectionTHUMOS' 14mAP59.7TadML-two stream
Action DetectionTHUMOS' 14mAP53.46TadML-rgb

Related Papers

Channel-wise Motion Features for Efficient Motion Segmentation2025-07-17DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16An Efficient Approach for Muscle Segmentation and 3D Reconstruction Using Keypoint Tracking in MRI Scan2025-07-11Learning to Track Any Points from Human Motion2025-07-08TLB-VFI: Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation2025-07-07MEMFOF: High-Resolution Training for Memory-Efficient Multi-Frame Optical Flow Estimation2025-06-29EndoFlow-SLAM: Real-Time Endoscopic SLAM with Flow-Constrained Gaussian Splatting2025-06-26WAFT: Warping-Alone Field Transforms for Optical Flow2025-06-26