ASFormer: Transformer for Action Segmentation

Fangqiu Yi, Hongyu Wen, Tingting Jiang

2021-10-16Action Segmentation Segmentation

Abstract

Algorithms for the action segmentation task typically use temporal models to predict what action is occurring at each frame for a minute-long daily activity. Recent studies have shown the potential of Transformer in modeling the relations among elements in sequential data. However, there are several major concerns when directly applying the Transformer to the action segmentation task, such as the lack of inductive biases with small training sets, the deficit in processing long input sequence, and the limitation of the decoder architecture to utilize temporal relations among multiple action segments to refine the initial predictions. To address these concerns, we design an efficient Transformer-based model for action segmentation task, named ASFormer, with three distinctive characteristics: (i) We explicitly bring in the local connectivity inductive priors because of the high locality of features. It constrains the hypothesis space within a reliable scope, and is beneficial for the action segmentation task to learn a proper target function with small training sets. (ii) We apply a pre-defined hierarchical representation pattern that efficiently handles long input sequences. (iii) We carefully design the decoder to refine the initial predictions from the encoder. Extensive experiments on three public datasets demonstrate that effectiveness of our methods. Code is available at \url{https://github.com/ChinaYi/ASFormer}.

Results

Task	Dataset	Metric	Value	Model
Action Localization	50 Salads	Acc	85.9	ASFormer+ASRF
Action Localization	50 Salads	Edit	81.9	ASFormer+ASRF
Action Localization	50 Salads	F1@10%	85.1	ASFormer+ASRF
Action Localization	50 Salads	F1@25%	85.4	ASFormer+ASRF
Action Localization	50 Salads	F1@50%	79.3	ASFormer+ASRF
Action Localization	50 Salads	Acc	85.6	ASFormer
Action Localization	50 Salads	Edit	79.6	ASFormer
Action Localization	50 Salads	F1@10%	85.1	ASFormer
Action Localization	50 Salads	F1@25%	83.4	ASFormer
Action Localization	50 Salads	F1@50%	76	ASFormer
Action Localization	Assembly101	Edit	30.5	ASFormer
Action Localization	Assembly101	F1@10%	33.4	ASFormer
Action Localization	Assembly101	F1@25%	29.2	ASFormer
Action Localization	Assembly101	F1@50%	21.4	ASFormer
Action Localization	Assembly101	MoF	38.8	ASFormer
Action Localization	GTEA	Acc	79.7	ASFormer
Action Localization	GTEA	Edit	84.6	ASFormer
Action Localization	GTEA	F1@10%	90.1	ASFormer
Action Localization	GTEA	F1@25%	88.8	ASFormer
Action Localization	GTEA	F1@50%	79.2	ASFormer
Action Localization	Breakfast	Acc	73.5	ASFormer
Action Localization	Breakfast	Average F1	68	ASFormer
Action Localization	Breakfast	Edit	75	ASFormer
Action Localization	Breakfast	F1@10%	76	ASFormer
Action Localization	Breakfast	F1@25%	70.6	ASFormer
Action Localization	Breakfast	F1@50%	57.4	ASFormer
Action Segmentation	50 Salads	Acc	85.9	ASFormer+ASRF
Action Segmentation	50 Salads	Edit	81.9	ASFormer+ASRF
Action Segmentation	50 Salads	F1@10%	85.1	ASFormer+ASRF
Action Segmentation	50 Salads	F1@25%	85.4	ASFormer+ASRF
Action Segmentation	50 Salads	F1@50%	79.3	ASFormer+ASRF
Action Segmentation	50 Salads	Acc	85.6	ASFormer
Action Segmentation	50 Salads	Edit	79.6	ASFormer
Action Segmentation	50 Salads	F1@10%	85.1	ASFormer
Action Segmentation	50 Salads	F1@25%	83.4	ASFormer
Action Segmentation	50 Salads	F1@50%	76	ASFormer
Action Segmentation	Assembly101	Edit	30.5	ASFormer
Action Segmentation	Assembly101	F1@10%	33.4	ASFormer
Action Segmentation	Assembly101	F1@25%	29.2	ASFormer
Action Segmentation	Assembly101	F1@50%	21.4	ASFormer
Action Segmentation	Assembly101	MoF	38.8	ASFormer
Action Segmentation	GTEA	Acc	79.7	ASFormer
Action Segmentation	GTEA	Edit	84.6	ASFormer
Action Segmentation	GTEA	F1@10%	90.1	ASFormer
Action Segmentation	GTEA	F1@25%	88.8	ASFormer
Action Segmentation	GTEA	F1@50%	79.2	ASFormer
Action Segmentation	Breakfast	Acc	73.5	ASFormer
Action Segmentation	Breakfast	Average F1	68	ASFormer
Action Segmentation	Breakfast	Edit	75	ASFormer
Action Segmentation	Breakfast	F1@10%	76	ASFormer
Action Segmentation	Breakfast	F1@25%	70.6	ASFormer
Action Segmentation	Breakfast	F1@50%	57.4	ASFormer

ASFormer: Transformer for Action Segmentation

Abstract

Results

Related Papers

ASFormer: Transformer for Action Segmentation

Abstract

Results

Related Papers