Temporal Reasoning Graph for Activity Recognition

Jingran Zhang, Fumin Shen, Xing Xu, Heng Tao Shen

2019-08-27Relation Extraction Action Recognition Temporal Relation Extraction Activity Recognition

Abstract

Despite great success has been achieved in activity analysis, it still has many challenges. Most existing work in activity recognition pay more attention to design efficient architecture or video sampling strategy. However, due to the property of fine-grained action and long term structure in video, activity recognition is expected to reason temporal relation between video sequences. In this paper, we propose an efficient temporal reasoning graph (TRG) to simultaneously capture the appearance features and temporal relation between video sequences at multiple time scales. Specifically, we construct learnable temporal relation graphs to explore temporal relation on the multi-scale range. Additionally, to facilitate multi-scale temporal relation extraction, we design a multi-head temporal adjacent matrix to represent multi-kinds of temporal relations. Eventually, a multi-head temporal relation aggregator is proposed to extract the semantic meaning of those features convolving through the graphs. Extensive experiments are performed on widely-used large-scale datasets, such as Something-Something and Charades, and the results show that our model can achieve state-of-the-art performance. Further analysis shows that temporal relation reasoning with our TRG can extract discriminative features for activity recognition.

Results

Task	Dataset	Metric	Value	Model
Activity Recognition	Something-Something V1	Top 1 Accuracy	49.7	TRG (Inception-V3)
Activity Recognition	Something-Something V1	Top 1 Accuracy	49.5	TRG (ResNet-50)
Activity Recognition	Something-Something V1	Top 5 Accuracy	86.1	TRG (ResNet-50)
Activity Recognition	Something-Something V2	Top-1 Accuracy	62.2	TRG (ResNet-50)
Activity Recognition	Something-Something V2	Top-5 Accuracy	90.3	TRG (ResNet-50)
Activity Recognition	Something-Something V2	Top-1 Accuracy	61.3	TRG (Inception-V3)
Activity Recognition	Something-Something V2	Top-5 Accuracy	91.4	TRG (Inception-V3)
Action Recognition	Something-Something V1	Top 1 Accuracy	49.7	TRG (Inception-V3)
Action Recognition	Something-Something V1	Top 1 Accuracy	49.5	TRG (ResNet-50)
Action Recognition	Something-Something V1	Top 5 Accuracy	86.1	TRG (ResNet-50)
Action Recognition	Something-Something V2	Top-1 Accuracy	62.2	TRG (ResNet-50)
Action Recognition	Something-Something V2	Top-5 Accuracy	90.3	TRG (ResNet-50)
Action Recognition	Something-Something V2	Top-1 Accuracy	61.3	TRG (Inception-V3)
Action Recognition	Something-Something V2	Top-5 Accuracy	91.4	TRG (Inception-V3)

Temporal Reasoning Graph for Activity Recognition

Abstract

Results

Related Papers

Temporal Reasoning Graph for Activity Recognition

Abstract

Results

Related Papers