TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Segmental Spatiotemporal CNNs for Fine-grained Action Segm...

Segmental Spatiotemporal CNNs for Fine-grained Action Segmentation

Colin Lea, Austin Reiter, Rene Vidal, Gregory D. Hager

2016-02-09Action SegmentationAction ClassificationFine-grained Action RecognitionSegmentationGeneral ClassificationAction RecognitionTemporal Action Localization
PaperPDF

Abstract

Joint segmentation and classification of fine-grained actions is important for applications of human-robot interaction, video surveillance, and human skill evaluation. However, despite substantial recent progress in large-scale action classification, the performance of state-of-the-art fine-grained action recognition approaches remains low. We propose a model for action segmentation which combines low-level spatiotemporal features with a high-level segmental classifier. Our spatiotemporal CNN is comprised of a spatial component that uses convolutional filters to capture information about objects and their relationships, and a temporal component that uses large 1D convolutional filters to capture information about how object relationships change across time. These features are used in tandem with a semi-Markov model that models transitions from one action to another. We introduce an efficient constrained segmental inference algorithm for this model that is orders of magnitude faster than the current approach. We highlight the effectiveness of our Segmental Spatiotemporal CNN on cooking and surgical action datasets for which we observe substantially improved performance relative to recent baseline methods.

Results

TaskDatasetMetricValueModel
Action LocalizationJIGSAWSAccuracy74.22ST-CNN+Seg
Action LocalizationJIGSAWSEdit Distance66.56ST-CNN+Seg
Action LocalizationGTEAAcc60.6ST-CNN
Action LocalizationGTEAF1@10%58.7ST-CNN
Action LocalizationGTEAF1@25%54.4ST-CNN
Action LocalizationGTEAF1@50%41.9ST-CNN
Action SegmentationJIGSAWSAccuracy74.22ST-CNN+Seg
Action SegmentationJIGSAWSEdit Distance66.56ST-CNN+Seg
Action SegmentationGTEAAcc60.6ST-CNN
Action SegmentationGTEAF1@10%58.7ST-CNN
Action SegmentationGTEAF1@25%54.4ST-CNN
Action SegmentationGTEAF1@50%41.9ST-CNN

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17