Activity Graph Transformer for Temporal Action Localization

Megha Nawhal, Greg Mori

2021-01-21Action Localization Temporal Action Localization

Abstract

We introduce Activity Graph Transformer, an end-to-end learnable model for temporal action localization, that receives a video as input and directly predicts a set of action instances that appear in the video. Detecting and localizing action instances in untrimmed videos requires reasoning over multiple action instances in a video. The dominant paradigms in the literature process videos temporally to either propose action regions or directly produce frame-level detections. However, sequential processing of videos is problematic when the action instances have non-sequential dependencies and/or non-linear temporal ordering, such as overlapping action instances or re-occurrence of action instances over the course of the video. In this work, we capture this non-linear temporal structure by reasoning over the videos as non-sequential entities in the form of graphs. We evaluate our model on challenging datasets: THUMOS14, Charades, and EPIC-Kitchens-100. Our results show that our proposed model outperforms the state-of-the-art by a considerable margin.

Results

Task	Dataset	Metric	Value	Model
Video	THUMOS’14	mAP IOU@0.1	72.1	AGT (Ours)
Video	THUMOS’14	mAP IOU@0.2	69.8	AGT (Ours)
Video	THUMOS’14	mAP IOU@0.3	65	AGT (Ours)
Video	THUMOS’14	mAP IOU@0.4	58.1	AGT (Ours)
Video	THUMOS’14	mAP IOU@0.5	50.2	AGT (Ours)
Temporal Action Localization	THUMOS’14	mAP IOU@0.1	72.1	AGT (Ours)
Temporal Action Localization	THUMOS’14	mAP IOU@0.2	69.8	AGT (Ours)
Temporal Action Localization	THUMOS’14	mAP IOU@0.3	65	AGT (Ours)
Temporal Action Localization	THUMOS’14	mAP IOU@0.4	58.1	AGT (Ours)
Temporal Action Localization	THUMOS’14	mAP IOU@0.5	50.2	AGT (Ours)
Zero-Shot Learning	THUMOS’14	mAP IOU@0.1	72.1	AGT (Ours)
Zero-Shot Learning	THUMOS’14	mAP IOU@0.2	69.8	AGT (Ours)
Zero-Shot Learning	THUMOS’14	mAP IOU@0.3	65	AGT (Ours)
Zero-Shot Learning	THUMOS’14	mAP IOU@0.4	58.1	AGT (Ours)
Zero-Shot Learning	THUMOS’14	mAP IOU@0.5	50.2	AGT (Ours)
Action Localization	THUMOS’14	mAP IOU@0.1	72.1	AGT (Ours)
Action Localization	THUMOS’14	mAP IOU@0.2	69.8	AGT (Ours)
Action Localization	THUMOS’14	mAP IOU@0.3	65	AGT (Ours)
Action Localization	THUMOS’14	mAP IOU@0.4	58.1	AGT (Ours)
Action Localization	THUMOS’14	mAP IOU@0.5	50.2	AGT (Ours)

Activity Graph Transformer for Temporal Action Localization

Abstract

Results

Related Papers

Activity Graph Transformer for Temporal Action Localization

Abstract

Results

Related Papers