TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Hierarchical Vector Quantization for Unsupervised Action S...

Hierarchical Vector Quantization for Unsupervised Action Segmentation

Federico Spurio, Emad Bahrami, Gianpiero Francesca, Juergen Gall

2024-12-23Action SegmentationRepresentation LearningUnsupervised Action SegmentationQuantizationTemporal Action SegmentationClustering
PaperPDFCode(official)

Abstract

In this work, we address unsupervised temporal action segmentation, which segments a set of long, untrimmed videos into semantically meaningful segments that are consistent across videos. While recent approaches combine representation learning and clustering in a single step for this task, they do not cope with large variations within temporal segments of the same class. To address this limitation, we propose a novel method, termed Hierarchical Vector Quantization (HVQ), that consists of two subsequent vector quantization modules. This results in a hierarchical clustering where the additional subclusters cover the variations within a cluster. We demonstrate that our approach captures the distribution of segment lengths much better than the state of the art. To this end, we introduce a new metric based on the Jensen-Shannon Distance (JSD) for unsupervised temporal action segmentation. We evaluate our approach on three public datasets, namely Breakfast, YouTube Instructional and IKEA ASM. Our approach outperforms the state of the art in terms of F1 score, recall and JSD.

Results

TaskDatasetMetricValueModel
Action LocalizationIKEA ASMAccuracy51.2HVQ
Action LocalizationIKEA ASMF130.7HVQ
Action LocalizationIKEA ASMJSD64.8HVQ
Action LocalizationIKEA ASMPrecision37.7HVQ
Action LocalizationIKEA ASMRecall25.9HVQ
Action LocalizationYoutube INRIA InstructionalAcc50.3HVQ
Action LocalizationYoutube INRIA InstructionalF135.1HVQ
Action LocalizationYoutube INRIA InstructionalPrecision32.1HVQ
Action LocalizationYoutube INRIA InstructionalRecall38.7HVQ
Action LocalizationBreakfastAcc54.4HVQ
Action LocalizationBreakfastF139.7HVQ
Action LocalizationBreakfastJSD82.5HVQ
Action LocalizationBreakfastPrecision35.6HVQ
Action LocalizationBreakfastRecall44.9HVQ
Action SegmentationIKEA ASMAccuracy51.2HVQ
Action SegmentationIKEA ASMF130.7HVQ
Action SegmentationIKEA ASMJSD64.8HVQ
Action SegmentationIKEA ASMPrecision37.7HVQ
Action SegmentationIKEA ASMRecall25.9HVQ
Action SegmentationYoutube INRIA InstructionalAcc50.3HVQ
Action SegmentationYoutube INRIA InstructionalF135.1HVQ
Action SegmentationYoutube INRIA InstructionalPrecision32.1HVQ
Action SegmentationYoutube INRIA InstructionalRecall38.7HVQ
Action SegmentationBreakfastAcc54.4HVQ
Action SegmentationBreakfastF139.7HVQ
Action SegmentationBreakfastJSD82.5HVQ
Action SegmentationBreakfastPrecision35.6HVQ
Action SegmentationBreakfastRecall44.9HVQ

Related Papers

Efficient Deployment of Spiking Neural Networks on SpiNNaker2 for DVS Gesture Recognition Using Neuromorphic Intermediate Representation2025-09-04Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper2025-07-20An End-to-End DNN Inference Framework for the SpiNNaker2 Neuromorphic MPSoC2025-07-18Tri-Learn Graph Fusion Network for Attributed Graph Clustering2025-07-18Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17Boosting Team Modeling through Tempo-Relational Representation Learning2025-07-17Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine2025-07-17Angle Estimation of a Single Source with Massive Uniform Circular Arrays2025-07-17