Hierarchical Vector Quantization for Unsupervised Action Segmentation

Federico Spurio, Emad Bahrami, Gianpiero Francesca, Juergen Gall

2024-12-23Action Segmentation Representation Learning Unsupervised Action Segmentation Quantization Temporal Action Segmentation Clustering

Paper PDF Code(official)

Abstract

In this work, we address unsupervised temporal action segmentation, which segments a set of long, untrimmed videos into semantically meaningful segments that are consistent across videos. While recent approaches combine representation learning and clustering in a single step for this task, they do not cope with large variations within temporal segments of the same class. To address this limitation, we propose a novel method, termed Hierarchical Vector Quantization (HVQ), that consists of two subsequent vector quantization modules. This results in a hierarchical clustering where the additional subclusters cover the variations within a cluster. We demonstrate that our approach captures the distribution of segment lengths much better than the state of the art. To this end, we introduce a new metric based on the Jensen-Shannon Distance (JSD) for unsupervised temporal action segmentation. We evaluate our approach on three public datasets, namely Breakfast, YouTube Instructional and IKEA ASM. Our approach outperforms the state of the art in terms of F1 score, recall and JSD.

Results

Task	Dataset	Metric	Value	Model
Action Localization	IKEA ASM	Accuracy	51.2	HVQ
Action Localization	IKEA ASM	F1	30.7	HVQ
Action Localization	IKEA ASM	JSD	64.8	HVQ
Action Localization	IKEA ASM	Precision	37.7	HVQ
Action Localization	IKEA ASM	Recall	25.9	HVQ
Action Localization	Youtube INRIA Instructional	Acc	50.3	HVQ
Action Localization	Youtube INRIA Instructional	F1	35.1	HVQ
Action Localization	Youtube INRIA Instructional	Precision	32.1	HVQ
Action Localization	Youtube INRIA Instructional	Recall	38.7	HVQ
Action Localization	Breakfast	Acc	54.4	HVQ
Action Localization	Breakfast	F1	39.7	HVQ
Action Localization	Breakfast	JSD	82.5	HVQ
Action Localization	Breakfast	Precision	35.6	HVQ
Action Localization	Breakfast	Recall	44.9	HVQ
Action Segmentation	IKEA ASM	Accuracy	51.2	HVQ
Action Segmentation	IKEA ASM	F1	30.7	HVQ
Action Segmentation	IKEA ASM	JSD	64.8	HVQ
Action Segmentation	IKEA ASM	Precision	37.7	HVQ
Action Segmentation	IKEA ASM	Recall	25.9	HVQ
Action Segmentation	Youtube INRIA Instructional	Acc	50.3	HVQ
Action Segmentation	Youtube INRIA Instructional	F1	35.1	HVQ
Action Segmentation	Youtube INRIA Instructional	Precision	32.1	HVQ
Action Segmentation	Youtube INRIA Instructional	Recall	38.7	HVQ
Action Segmentation	Breakfast	Acc	54.4	HVQ
Action Segmentation	Breakfast	F1	39.7	HVQ
Action Segmentation	Breakfast	JSD	82.5	HVQ
Action Segmentation	Breakfast	Precision	35.6	HVQ
Action Segmentation	Breakfast	Recall	44.9	HVQ

Hierarchical Vector Quantization for Unsupervised Action Segmentation

Abstract

Results

Related Papers

Hierarchical Vector Quantization for Unsupervised Action Segmentation

Abstract

Results

Related Papers