TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Unsupervised Action Segmentation by Joint Representation L...

Unsupervised Action Segmentation by Joint Representation Learning and Online Clustering

Sateesh Kumar, Sanjay Haresh, Awais Ahmed, Andrey Konin, M. Zeeshan Zia, Quoc-Huy Tran

2021-05-27CVPR 2022 1Action SegmentationOnline ClusteringRepresentation LearningUnsupervised Action SegmentationClustering
PaperPDFCode(official)

Abstract

We present a novel approach for unsupervised activity segmentation which uses video frame clustering as a pretext task and simultaneously performs representation learning and online clustering. This is in contrast with prior works where representation learning and clustering are often performed sequentially. We leverage temporal information in videos by employing temporal optimal transport. In particular, we incorporate a temporal regularization term which preserves the temporal order of the activity into the standard optimal transport module for computing pseudo-label cluster assignments. The temporal optimal transport module enables our approach to learn effective representations for unsupervised activity segmentation. Furthermore, previous methods require storing learned features for the entire dataset before clustering them in an offline manner, whereas our approach processes one mini-batch at a time in an online manner. Extensive evaluations on three public datasets, i.e. 50-Salads, YouTube Instructions, and Breakfast, and our dataset, i.e., Desktop Assembly, show that our approach performs on par with or better than previous methods, despite having significantly less memory constraints. Our code and dataset are available on our research website: https://retrocausal.ai/research/

Results

TaskDatasetMetricValueModel
Action LocalizationIKEA ASMAccuracy23.8TOT+TCL
Action LocalizationIKEA ASMF120.9TOT+TCL
Action LocalizationIKEA ASMJSD79.5TOT+TCL
Action LocalizationIKEA ASMPrecision25.5TOT+TCL
Action LocalizationIKEA ASMRecall17.7TOT+TCL
Action LocalizationIKEA ASMAccuracy21TOT
Action LocalizationIKEA ASMF120.1TOT
Action LocalizationIKEA ASMJSD80TOT
Action LocalizationIKEA ASMPrecision24.4TOT
Action LocalizationIKEA ASMRecall17.1TOT
Action Localization50 SaladsAcc45.3TOT+TCL
Action Localization50 SaladsF132.9TOT+TCL
Action Localization50 SaladsAcc40.6TOT
Action Localization50 SaladsF130TOT
Action LocalizationYoutube INRIA InstructionalAcc45.3TOT+TCL
Action LocalizationYoutube INRIA InstructionalF132.9TOT+TCL
Action LocalizationYoutube INRIA InstructionalPrecision40.1TOT+TCL
Action LocalizationYoutube INRIA InstructionalRecall27.9TOT+TCL
Action LocalizationYoutube INRIA InstructionalAcc40.6TOT
Action LocalizationYoutube INRIA InstructionalF130TOT
Action LocalizationYoutube INRIA InstructionalPrecision28.7TOT
Action LocalizationYoutube INRIA InstructionalRecall31.4TOT
Action LocalizationBreakfastAcc47.5TOT
Action LocalizationBreakfastF131TOT
Action LocalizationBreakfastJSD90.2TOT
Action LocalizationBreakfastPrecision37.7TOT
Action LocalizationBreakfastRecall26.3TOT
Action LocalizationBreakfastAcc39TOT+TCL
Action LocalizationBreakfastF130.3TOT+TCL
Action LocalizationBreakfastJSD85.6TOT+TCL
Action LocalizationBreakfastPrecision26.2TOT+TCL
Action LocalizationBreakfastRecall36TOT+TCL
Action SegmentationIKEA ASMAccuracy23.8TOT+TCL
Action SegmentationIKEA ASMF120.9TOT+TCL
Action SegmentationIKEA ASMJSD79.5TOT+TCL
Action SegmentationIKEA ASMPrecision25.5TOT+TCL
Action SegmentationIKEA ASMRecall17.7TOT+TCL
Action SegmentationIKEA ASMAccuracy21TOT
Action SegmentationIKEA ASMF120.1TOT
Action SegmentationIKEA ASMJSD80TOT
Action SegmentationIKEA ASMPrecision24.4TOT
Action SegmentationIKEA ASMRecall17.1TOT
Action Segmentation50 SaladsAcc45.3TOT+TCL
Action Segmentation50 SaladsF132.9TOT+TCL
Action Segmentation50 SaladsAcc40.6TOT
Action Segmentation50 SaladsF130TOT
Action SegmentationYoutube INRIA InstructionalAcc45.3TOT+TCL
Action SegmentationYoutube INRIA InstructionalF132.9TOT+TCL
Action SegmentationYoutube INRIA InstructionalPrecision40.1TOT+TCL
Action SegmentationYoutube INRIA InstructionalRecall27.9TOT+TCL
Action SegmentationYoutube INRIA InstructionalAcc40.6TOT
Action SegmentationYoutube INRIA InstructionalF130TOT
Action SegmentationYoutube INRIA InstructionalPrecision28.7TOT
Action SegmentationYoutube INRIA InstructionalRecall31.4TOT
Action SegmentationBreakfastAcc47.5TOT
Action SegmentationBreakfastF131TOT
Action SegmentationBreakfastJSD90.2TOT
Action SegmentationBreakfastPrecision37.7TOT
Action SegmentationBreakfastRecall26.3TOT
Action SegmentationBreakfastAcc39TOT+TCL
Action SegmentationBreakfastF130.3TOT+TCL
Action SegmentationBreakfastJSD85.6TOT+TCL
Action SegmentationBreakfastPrecision26.2TOT+TCL
Action SegmentationBreakfastRecall36TOT+TCL

Related Papers

Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper2025-07-20Tri-Learn Graph Fusion Network for Attributed Graph Clustering2025-07-18Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17Boosting Team Modeling through Tempo-Relational Representation Learning2025-07-17Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16Are encoders able to learn landmarkers for warm-starting of Hyperparameter Optimization?2025-07-16Language-Guided Contrastive Audio-Visual Masked Autoencoder with Automatically Generated Audio-Visual-Text Triplets from Videos2025-07-16Ranking Vectors Clustering: Theory and Applications2025-07-16