TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/ASFormer: Transformer for Action Segmentation

ASFormer: Transformer for Action Segmentation

Fangqiu Yi, Hongyu Wen, Tingting Jiang

2021-10-16Action SegmentationSegmentation
PaperPDFCode(official)

Abstract

Algorithms for the action segmentation task typically use temporal models to predict what action is occurring at each frame for a minute-long daily activity. Recent studies have shown the potential of Transformer in modeling the relations among elements in sequential data. However, there are several major concerns when directly applying the Transformer to the action segmentation task, such as the lack of inductive biases with small training sets, the deficit in processing long input sequence, and the limitation of the decoder architecture to utilize temporal relations among multiple action segments to refine the initial predictions. To address these concerns, we design an efficient Transformer-based model for action segmentation task, named ASFormer, with three distinctive characteristics: (i) We explicitly bring in the local connectivity inductive priors because of the high locality of features. It constrains the hypothesis space within a reliable scope, and is beneficial for the action segmentation task to learn a proper target function with small training sets. (ii) We apply a pre-defined hierarchical representation pattern that efficiently handles long input sequences. (iii) We carefully design the decoder to refine the initial predictions from the encoder. Extensive experiments on three public datasets demonstrate that effectiveness of our methods. Code is available at \url{https://github.com/ChinaYi/ASFormer}.

Results

TaskDatasetMetricValueModel
Action Localization50 SaladsAcc85.9ASFormer+ASRF
Action Localization50 SaladsEdit81.9ASFormer+ASRF
Action Localization50 SaladsF1@10%85.1ASFormer+ASRF
Action Localization50 SaladsF1@25%85.4ASFormer+ASRF
Action Localization50 SaladsF1@50%79.3ASFormer+ASRF
Action Localization50 SaladsAcc85.6ASFormer
Action Localization50 SaladsEdit79.6ASFormer
Action Localization50 SaladsF1@10%85.1ASFormer
Action Localization50 SaladsF1@25%83.4ASFormer
Action Localization50 SaladsF1@50%76ASFormer
Action LocalizationAssembly101Edit30.5ASFormer
Action LocalizationAssembly101F1@10%33.4ASFormer
Action LocalizationAssembly101F1@25%29.2ASFormer
Action LocalizationAssembly101F1@50%21.4ASFormer
Action LocalizationAssembly101MoF38.8ASFormer
Action LocalizationGTEAAcc79.7ASFormer
Action LocalizationGTEAEdit84.6ASFormer
Action LocalizationGTEAF1@10%90.1ASFormer
Action LocalizationGTEAF1@25%88.8ASFormer
Action LocalizationGTEAF1@50%79.2ASFormer
Action LocalizationBreakfastAcc73.5ASFormer
Action LocalizationBreakfastAverage F168ASFormer
Action LocalizationBreakfastEdit75ASFormer
Action LocalizationBreakfastF1@10%76ASFormer
Action LocalizationBreakfastF1@25%70.6ASFormer
Action LocalizationBreakfastF1@50%57.4ASFormer
Action Segmentation50 SaladsAcc85.9ASFormer+ASRF
Action Segmentation50 SaladsEdit81.9ASFormer+ASRF
Action Segmentation50 SaladsF1@10%85.1ASFormer+ASRF
Action Segmentation50 SaladsF1@25%85.4ASFormer+ASRF
Action Segmentation50 SaladsF1@50%79.3ASFormer+ASRF
Action Segmentation50 SaladsAcc85.6ASFormer
Action Segmentation50 SaladsEdit79.6ASFormer
Action Segmentation50 SaladsF1@10%85.1ASFormer
Action Segmentation50 SaladsF1@25%83.4ASFormer
Action Segmentation50 SaladsF1@50%76ASFormer
Action SegmentationAssembly101Edit30.5ASFormer
Action SegmentationAssembly101F1@10%33.4ASFormer
Action SegmentationAssembly101F1@25%29.2ASFormer
Action SegmentationAssembly101F1@50%21.4ASFormer
Action SegmentationAssembly101MoF38.8ASFormer
Action SegmentationGTEAAcc79.7ASFormer
Action SegmentationGTEAEdit84.6ASFormer
Action SegmentationGTEAF1@10%90.1ASFormer
Action SegmentationGTEAF1@25%88.8ASFormer
Action SegmentationGTEAF1@50%79.2ASFormer
Action SegmentationBreakfastAcc73.5ASFormer
Action SegmentationBreakfastAverage F168ASFormer
Action SegmentationBreakfastEdit75ASFormer
Action SegmentationBreakfastF1@10%76ASFormer
Action SegmentationBreakfastF1@25%70.6ASFormer
Action SegmentationBreakfastF1@50%57.4ASFormer

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17