Permutation-Aware Action Segmentation via Unsupervised Frame-to-Segment Alignment

Quoc-Huy Tran, Ahmed Mehmood, Muhammad Ahmed, Muhammad Naufil, Anas Zafar, Andrey Konin, M. Zeeshan Zia

2023-05-31Action Segmentation Unsupervised Action Segmentation Segmentation Prediction

Abstract

This paper presents an unsupervised transformer-based framework for temporal activity segmentation which leverages not only frame-level cues but also segment-level cues. This is in contrast with previous methods which often rely on frame-level information only. Our approach begins with a frame-level prediction module which estimates framewise action classes via a transformer encoder. The frame-level prediction module is trained in an unsupervised manner via temporal optimal transport. To exploit segment-level information, we utilize a segment-level prediction module and a frame-to-segment alignment module. The former includes a transformer decoder for estimating video transcripts, while the latter matches frame-level features with segment-level features, yielding permutation-aware segmentation results. Moreover, inspired by temporal optimal transport, we introduce simple-yet-effective pseudo labels for unsupervised training of the above modules. Our experiments on four public datasets, i.e., 50 Salads, YouTube Instructions, Breakfast, and Desktop Assembly show that our approach achieves comparable or better performance than previous methods in unsupervised activity segmentation.

Results

Task	Dataset	Metric	Value	Model
Action Localization	Youtube INRIA Instructional	Acc	49.6	UFSA
Action Localization	Youtube INRIA Instructional	F1	32.4	UFSA
Action Localization	Breakfast	Acc	52.1	UFSA
Action Localization	Breakfast	F1	38	UFSA
Action Segmentation	Youtube INRIA Instructional	Acc	49.6	UFSA
Action Segmentation	Youtube INRIA Instructional	F1	32.4	UFSA
Action Segmentation	Breakfast	Acc	52.1	UFSA
Action Segmentation	Breakfast	F1	38	UFSA

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21 Multi-Strategy Improved Snake Optimizer Accelerated CNN-LSTM-Attention-Adaboost for Trajectory Prediction2025-07-21 Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17 DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17 From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17 Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion2025-07-17 SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17 Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17