TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/How Much Temporal Long-Term Context is Needed for Action S...

How Much Temporal Long-Term Context is Needed for Action Segmentation?

Emad Bahrami, Gianpiero Francesca, Juergen Gall

2023-08-22ICCV 2023 1Action SegmentationTemporal Action SegmentationSegmentation
PaperPDFCode(official)

Abstract

Modeling long-term context in videos is crucial for many fine-grained tasks including temporal action segmentation. An interesting question that is still open is how much long-term temporal context is needed for optimal performance. While transformers can model the long-term context of a video, this becomes computationally prohibitive for long videos. Recent works on temporal action segmentation thus combine temporal convolutional networks with self-attentions that are computed only for a local temporal window. While these approaches show good results, their performance is limited by their inability to capture the full context of a video. In this work, we try to answer how much long-term temporal context is required for temporal action segmentation by introducing a transformer-based model that leverages sparse attention to capture the full context of a video. We compare our model with the current state of the art on three datasets for temporal action segmentation, namely 50Salads, Breakfast, and Assembly101. Our experiments show that modeling the full context of a video is necessary to obtain the best performance for temporal action segmentation.

Results

TaskDatasetMetricValueModel
Action Localization50 SaladsAcc87.7LTContext
Action Localization50 SaladsEdit83.2LTContext
Action Localization50 SaladsF1@10%89.4LTContext
Action Localization50 SaladsF1@25%87.7LTContext
Action Localization50 SaladsF1@50%82LTContext
Action LocalizationAssembly101Edit30.4LTContext
Action LocalizationAssembly101F1@10%33.9LTContext
Action LocalizationAssembly101F1@25%30LTContext
Action LocalizationAssembly101F1@50%22.6LTContext
Action LocalizationAssembly101MoF41.2LTContext
Action LocalizationBreakfastAcc74.2LTContext
Action LocalizationBreakfastAverage F170.1LTContext
Action LocalizationBreakfastEdit77LTContext
Action LocalizationBreakfastF1@10%77.6LTContext
Action LocalizationBreakfastF1@25%72.6LTContext
Action LocalizationBreakfastF1@50%60.1LTContext
Action Segmentation50 SaladsAcc87.7LTContext
Action Segmentation50 SaladsEdit83.2LTContext
Action Segmentation50 SaladsF1@10%89.4LTContext
Action Segmentation50 SaladsF1@25%87.7LTContext
Action Segmentation50 SaladsF1@50%82LTContext
Action SegmentationAssembly101Edit30.4LTContext
Action SegmentationAssembly101F1@10%33.9LTContext
Action SegmentationAssembly101F1@25%30LTContext
Action SegmentationAssembly101F1@50%22.6LTContext
Action SegmentationAssembly101MoF41.2LTContext
Action SegmentationBreakfastAcc74.2LTContext
Action SegmentationBreakfastAverage F170.1LTContext
Action SegmentationBreakfastEdit77LTContext
Action SegmentationBreakfastF1@10%77.6LTContext
Action SegmentationBreakfastF1@25%72.6LTContext
Action SegmentationBreakfastF1@50%60.1LTContext

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17