TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Dual DETRs for Multi-Label Temporal Action Detection

Dual DETRs for Multi-Label Temporal Action Detection

Yuhan Zhu, Guozhen Zhang, Jing Tan, Gangshan Wu, LiMin Wang

2024-03-31CVPR 2024 1Action DetectionTemporal Action Localizationobject-detectionObject Detection
PaperPDF

Abstract

Temporal Action Detection (TAD) aims to identify the action boundaries and the corresponding category within untrimmed videos. Inspired by the success of DETR in object detection, several methods have adapted the query-based framework to the TAD task. However, these approaches primarily followed DETR to predict actions at the instance level (i.e., identify each action by its center point), leading to sub-optimal boundary localization. To address this issue, we propose a new Dual-level query-based TAD framework, namely DualDETR, to detect actions from both instance-level and boundary-level. Decoding at different levels requires semantics of different granularity, therefore we introduce a two-branch decoding structure. This structure builds distinctive decoding processes for different levels, facilitating explicit capture of temporal cues and semantics at each level. On top of the two-branch design, we present a joint query initialization strategy to align queries from both levels. Specifically, we leverage encoder proposals to match queries from each level in a one-to-one manner. Then, the matched queries are initialized using position and content prior from the matched action proposal. The aligned dual-level queries can refine the matched proposal with complementary cues during subsequent decoding. We evaluate DualDETR on three challenging multi-label TAD benchmarks. The experimental results demonstrate the superior performance of DualDETR to the existing state-of-the-art methods, achieving a substantial improvement under det-mAP and delivering impressive results under seg-mAP.

Results

TaskDatasetMetricValueModel
VideoTHUMOS’14Avg mAP (0.3:0.7)66.8DualDETR (I3D features)
VideoTHUMOS’14mAP IOU@0.382.9DualDETR (I3D features)
VideoTHUMOS’14mAP IOU@0.478DualDETR (I3D features)
VideoTHUMOS’14mAP IOU@0.570.4DualDETR (I3D features)
VideoTHUMOS’14mAP IOU@0.658.5DualDETR (I3D features)
VideoTHUMOS’14mAP IOU@0.744.4DualDETR (I3D features)
VideoMultiTHUMOSAverage mAP32.64DualDETR (I3D-rgb)
VideoMultiTHUMOSmAP IOU@0.153.42DualDETR (I3D-rgb)
VideoMultiTHUMOSmAP IOU@0.347.41DualDETR (I3D-rgb)
VideoMultiTHUMOSmAP IOU@0.535.18DualDETR (I3D-rgb)
VideoMultiTHUMOSmAP IOU@0.720.18DualDETR (I3D-rgb)
VideoMultiTHUMOSmAP IOU@0.94.02DualDETR (I3D-rgb)
Temporal Action LocalizationTHUMOS’14Avg mAP (0.3:0.7)66.8DualDETR (I3D features)
Temporal Action LocalizationTHUMOS’14mAP IOU@0.382.9DualDETR (I3D features)
Temporal Action LocalizationTHUMOS’14mAP IOU@0.478DualDETR (I3D features)
Temporal Action LocalizationTHUMOS’14mAP IOU@0.570.4DualDETR (I3D features)
Temporal Action LocalizationTHUMOS’14mAP IOU@0.658.5DualDETR (I3D features)
Temporal Action LocalizationTHUMOS’14mAP IOU@0.744.4DualDETR (I3D features)
Temporal Action LocalizationMultiTHUMOSAverage mAP32.64DualDETR (I3D-rgb)
Temporal Action LocalizationMultiTHUMOSmAP IOU@0.153.42DualDETR (I3D-rgb)
Temporal Action LocalizationMultiTHUMOSmAP IOU@0.347.41DualDETR (I3D-rgb)
Temporal Action LocalizationMultiTHUMOSmAP IOU@0.535.18DualDETR (I3D-rgb)
Temporal Action LocalizationMultiTHUMOSmAP IOU@0.720.18DualDETR (I3D-rgb)
Temporal Action LocalizationMultiTHUMOSmAP IOU@0.94.02DualDETR (I3D-rgb)
Zero-Shot LearningTHUMOS’14Avg mAP (0.3:0.7)66.8DualDETR (I3D features)
Zero-Shot LearningTHUMOS’14mAP IOU@0.382.9DualDETR (I3D features)
Zero-Shot LearningTHUMOS’14mAP IOU@0.478DualDETR (I3D features)
Zero-Shot LearningTHUMOS’14mAP IOU@0.570.4DualDETR (I3D features)
Zero-Shot LearningTHUMOS’14mAP IOU@0.658.5DualDETR (I3D features)
Zero-Shot LearningTHUMOS’14mAP IOU@0.744.4DualDETR (I3D features)
Zero-Shot LearningMultiTHUMOSAverage mAP32.64DualDETR (I3D-rgb)
Zero-Shot LearningMultiTHUMOSmAP IOU@0.153.42DualDETR (I3D-rgb)
Zero-Shot LearningMultiTHUMOSmAP IOU@0.347.41DualDETR (I3D-rgb)
Zero-Shot LearningMultiTHUMOSmAP IOU@0.535.18DualDETR (I3D-rgb)
Zero-Shot LearningMultiTHUMOSmAP IOU@0.720.18DualDETR (I3D-rgb)
Zero-Shot LearningMultiTHUMOSmAP IOU@0.94.02DualDETR (I3D-rgb)
Action LocalizationTHUMOS’14Avg mAP (0.3:0.7)66.8DualDETR (I3D features)
Action LocalizationTHUMOS’14mAP IOU@0.382.9DualDETR (I3D features)
Action LocalizationTHUMOS’14mAP IOU@0.478DualDETR (I3D features)
Action LocalizationTHUMOS’14mAP IOU@0.570.4DualDETR (I3D features)
Action LocalizationTHUMOS’14mAP IOU@0.658.5DualDETR (I3D features)
Action LocalizationTHUMOS’14mAP IOU@0.744.4DualDETR (I3D features)
Action LocalizationMultiTHUMOSAverage mAP32.64DualDETR (I3D-rgb)
Action LocalizationMultiTHUMOSmAP IOU@0.153.42DualDETR (I3D-rgb)
Action LocalizationMultiTHUMOSmAP IOU@0.347.41DualDETR (I3D-rgb)
Action LocalizationMultiTHUMOSmAP IOU@0.535.18DualDETR (I3D-rgb)
Action LocalizationMultiTHUMOSmAP IOU@0.720.18DualDETR (I3D-rgb)
Action LocalizationMultiTHUMOSmAP IOU@0.94.02DualDETR (I3D-rgb)

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images2025-07-17Decoupled PROB: Decoupled Query Initialization Tasks and Objectness-Class Learning for Open World Object Detection2025-07-17Dual LiDAR-Based Traffic Movement Count Estimation at a Signalized Intersection: Deployment, Data Collection, and Preliminary Analysis2025-07-17DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping2025-07-15ECORE: Energy-Conscious Optimized Routing for Deep Learning Models at the Edge2025-07-08