Leveraging triplet loss for unsupervised action segmentation

E. Bueno-Benito, B. Tura, M. Dimiccoli

2023-04-13Action Segmentation Unsupervised Action Segmentation Metric Learning Segmentation Clustering Video Understanding

Paper PDF Code(official)

Abstract

In this paper, we propose a novel fully unsupervised framework that learns action representations suitable for the action segmentation task from the single input video itself, without requiring any training data. Our method is a deep metric learning approach rooted in a shallow network with a triplet loss operating on similarity distributions and a novel triplet selection strategy that effectively models temporal and semantic priors to discover actions in the new representational space. Under these circumstances, we successfully recover temporal boundaries in the learned action representations with higher quality compared with existing unsupervised approaches. The proposed method is evaluated on two widely used benchmark datasets for the action segmentation task and it achieves competitive performance by applying a generic clustering algorithm on the learned representations.

Results

Task	Dataset	Metric	Value	Model
Action Localization	Breakfast	Acc	65.1	TSA (FINCH)
Action Localization	Breakfast	mIoU	52.1	TSA (FINCH)
Action Localization	Breakfast	Acc	63.7	TSA (Kmeans)
Action Localization	Breakfast	F1	58	TSA (Kmeans)
Action Localization	Breakfast	mIoU	53.3	TSA (Kmeans)
Action Localization	Breakfast	Acc	63.2	TSA (Spectral)
Action Localization	Breakfast	F1	57.8	TSA (Spectral)
Action Localization	Breakfast	mIoU	52.7	TSA (Spectral)
Action Localization	Youtube INRIA Instructional	Acc	62.4	TSA (FINCH)
Action Localization	Youtube INRIA Instructional	F1	54.7	TSA (FINCH)
Action Localization	Youtube INRIA Instructional	Acc	59.7	TSA (Kmeans)
Action Localization	Youtube INRIA Instructional	F1	55.3	TSA (Kmeans)
Action Segmentation	Breakfast	Acc	65.1	TSA (FINCH)
Action Segmentation	Breakfast	mIoU	52.1	TSA (FINCH)
Action Segmentation	Breakfast	Acc	63.7	TSA (Kmeans)
Action Segmentation	Breakfast	F1	58	TSA (Kmeans)
Action Segmentation	Breakfast	mIoU	53.3	TSA (Kmeans)
Action Segmentation	Breakfast	Acc	63.2	TSA (Spectral)
Action Segmentation	Breakfast	F1	57.8	TSA (Spectral)
Action Segmentation	Breakfast	mIoU	52.7	TSA (Spectral)
Action Segmentation	Youtube INRIA Instructional	Acc	62.4	TSA (FINCH)
Action Segmentation	Youtube INRIA Instructional	F1	54.7	TSA (FINCH)
Action Segmentation	Youtube INRIA Instructional	Acc	59.7	TSA (Kmeans)
Action Segmentation	Youtube INRIA Instructional	F1	55.3	TSA (Kmeans)

Leveraging triplet loss for unsupervised action segmentation

Abstract

Results

Related Papers

Leveraging triplet loss for unsupervised action segmentation

Abstract

Results

Related Papers