ReAct: Temporal Action Detection with Relational Queries

Dingfeng Shi, Yujie Zhong, Qiong Cao, Jing Zhang, Lin Ma, Jia Li, DaCheng Tao

2022-07-14Action Detection Action Classification Classification Temporal Action Localization object-detection Object Detection

Paper PDF Code(official)

Abstract

This work aims at advancing temporal action detection (TAD) using an encoder-decoder framework with action queries, similar to DETR, which has shown great success in object detection. However, the framework suffers from several problems if directly applied to TAD: the insufficient exploration of inter-query relation in the decoder, the inadequate classification training due to a limited number of training samples, and the unreliable classification scores at inference. To this end, we first propose a relational attention mechanism in the decoder, which guides the attention among queries based on their relations. Moreover, we propose two losses to facilitate and stabilize the training of action classification. Lastly, we propose to predict the localization quality of each action query at inference in order to distinguish high-quality queries. The proposed method, named ReAct, achieves the state-of-the-art performance on THUMOS14, with much lower computational costs than previous methods. Besides, extensive ablation studies are conducted to verify the effectiveness of each proposed component. The code is available at https://github.com/sssste/React.

Results

Task	Dataset	Metric	Value	Model
Video	THUMOS’14	Avg mAP (0.3:0.7)	55	ReAct (TSN features)
Video	THUMOS’14	mAP IOU@0.3	69.2	ReAct (TSN features)
Video	THUMOS’14	mAP IOU@0.4	65	ReAct (TSN features)
Video	THUMOS’14	mAP IOU@0.5	57.1	ReAct (TSN features)
Video	THUMOS’14	mAP IOU@0.6	47.8	ReAct (TSN features)
Video	THUMOS’14	mAP IOU@0.7	35.6	ReAct (TSN features)
Temporal Action Localization	THUMOS’14	Avg mAP (0.3:0.7)	55	ReAct (TSN features)
Temporal Action Localization	THUMOS’14	mAP IOU@0.3	69.2	ReAct (TSN features)
Temporal Action Localization	THUMOS’14	mAP IOU@0.4	65	ReAct (TSN features)
Temporal Action Localization	THUMOS’14	mAP IOU@0.5	57.1	ReAct (TSN features)
Temporal Action Localization	THUMOS’14	mAP IOU@0.6	47.8	ReAct (TSN features)
Temporal Action Localization	THUMOS’14	mAP IOU@0.7	35.6	ReAct (TSN features)
Zero-Shot Learning	THUMOS’14	Avg mAP (0.3:0.7)	55	ReAct (TSN features)
Zero-Shot Learning	THUMOS’14	mAP IOU@0.3	69.2	ReAct (TSN features)
Zero-Shot Learning	THUMOS’14	mAP IOU@0.4	65	ReAct (TSN features)
Zero-Shot Learning	THUMOS’14	mAP IOU@0.5	57.1	ReAct (TSN features)
Zero-Shot Learning	THUMOS’14	mAP IOU@0.6	47.8	ReAct (TSN features)
Zero-Shot Learning	THUMOS’14	mAP IOU@0.7	35.6	ReAct (TSN features)
Action Localization	THUMOS’14	Avg mAP (0.3:0.7)	55	ReAct (TSN features)
Action Localization	THUMOS’14	mAP IOU@0.3	69.2	ReAct (TSN features)
Action Localization	THUMOS’14	mAP IOU@0.4	65	ReAct (TSN features)
Action Localization	THUMOS’14	mAP IOU@0.5	57.1	ReAct (TSN features)
Action Localization	THUMOS’14	mAP IOU@0.6	47.8	ReAct (TSN features)
Action Localization	THUMOS’14	mAP IOU@0.7	35.6	ReAct (TSN features)

ReAct: Temporal Action Detection with Relational Queries

Abstract

Results

Related Papers

ReAct: Temporal Action Detection with Relational Queries

Abstract

Results

Related Papers