TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/ReAct: Temporal Action Detection with Relational Queries

ReAct: Temporal Action Detection with Relational Queries

Dingfeng Shi, Yujie Zhong, Qiong Cao, Jing Zhang, Lin Ma, Jia Li, DaCheng Tao

2022-07-14Action DetectionAction ClassificationClassificationTemporal Action Localizationobject-detectionObject Detection
PaperPDFCode(official)

Abstract

This work aims at advancing temporal action detection (TAD) using an encoder-decoder framework with action queries, similar to DETR, which has shown great success in object detection. However, the framework suffers from several problems if directly applied to TAD: the insufficient exploration of inter-query relation in the decoder, the inadequate classification training due to a limited number of training samples, and the unreliable classification scores at inference. To this end, we first propose a relational attention mechanism in the decoder, which guides the attention among queries based on their relations. Moreover, we propose two losses to facilitate and stabilize the training of action classification. Lastly, we propose to predict the localization quality of each action query at inference in order to distinguish high-quality queries. The proposed method, named ReAct, achieves the state-of-the-art performance on THUMOS14, with much lower computational costs than previous methods. Besides, extensive ablation studies are conducted to verify the effectiveness of each proposed component. The code is available at https://github.com/sssste/React.

Results

TaskDatasetMetricValueModel
VideoTHUMOS’14Avg mAP (0.3:0.7)55ReAct (TSN features)
VideoTHUMOS’14mAP IOU@0.369.2ReAct (TSN features)
VideoTHUMOS’14mAP IOU@0.465ReAct (TSN features)
VideoTHUMOS’14mAP IOU@0.557.1ReAct (TSN features)
VideoTHUMOS’14mAP IOU@0.647.8ReAct (TSN features)
VideoTHUMOS’14mAP IOU@0.735.6ReAct (TSN features)
Temporal Action LocalizationTHUMOS’14Avg mAP (0.3:0.7)55ReAct (TSN features)
Temporal Action LocalizationTHUMOS’14mAP IOU@0.369.2ReAct (TSN features)
Temporal Action LocalizationTHUMOS’14mAP IOU@0.465ReAct (TSN features)
Temporal Action LocalizationTHUMOS’14mAP IOU@0.557.1ReAct (TSN features)
Temporal Action LocalizationTHUMOS’14mAP IOU@0.647.8ReAct (TSN features)
Temporal Action LocalizationTHUMOS’14mAP IOU@0.735.6ReAct (TSN features)
Zero-Shot LearningTHUMOS’14Avg mAP (0.3:0.7)55ReAct (TSN features)
Zero-Shot LearningTHUMOS’14mAP IOU@0.369.2ReAct (TSN features)
Zero-Shot LearningTHUMOS’14mAP IOU@0.465ReAct (TSN features)
Zero-Shot LearningTHUMOS’14mAP IOU@0.557.1ReAct (TSN features)
Zero-Shot LearningTHUMOS’14mAP IOU@0.647.8ReAct (TSN features)
Zero-Shot LearningTHUMOS’14mAP IOU@0.735.6ReAct (TSN features)
Action LocalizationTHUMOS’14Avg mAP (0.3:0.7)55ReAct (TSN features)
Action LocalizationTHUMOS’14mAP IOU@0.369.2ReAct (TSN features)
Action LocalizationTHUMOS’14mAP IOU@0.465ReAct (TSN features)
Action LocalizationTHUMOS’14mAP IOU@0.557.1ReAct (TSN features)
Action LocalizationTHUMOS’14mAP IOU@0.647.8ReAct (TSN features)
Action LocalizationTHUMOS’14mAP IOU@0.735.6ReAct (TSN features)

Related Papers

Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images2025-07-17Decoupled PROB: Decoupled Query Initialization Tasks and Objectness-Class Learning for Open World Object Detection2025-07-17Dual LiDAR-Based Traffic Movement Count Estimation at a Signalized Intersection: Deployment, Data Collection, and Preliminary Analysis2025-07-17Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16Safeguarding Federated Learning-based Road Condition Classification2025-07-16DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16