TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Temporal Relational Modeling with Self-Supervision for Act...

Temporal Relational Modeling with Self-Supervision for Action Segmentation

Dong Wang, Di Hu, Xingjian Li, Dejing Dou

2020-12-14Action SegmentationAction RecognitionAction Understanding
PaperPDFCode(official)

Abstract

Temporal relational modeling in video is essential for human action understanding, such as action recognition and action segmentation. Although Graph Convolution Networks (GCNs) have shown promising advantages in relation reasoning on many tasks, it is still a challenge to apply graph convolution networks on long video sequences effectively. The main reason is that large number of nodes (i.e., video frames) makes GCNs hard to capture and model temporal relations in videos. To tackle this problem, in this paper, we introduce an effective GCN module, Dilated Temporal Graph Reasoning Module (DTGRM), designed to model temporal relations and dependencies between video frames at various time spans. In particular, we capture and model temporal relations via constructing multi-level dilated temporal graphs where the nodes represent frames from different moments in video. Moreover, to enhance temporal reasoning ability of the proposed model, an auxiliary self-supervised task is proposed to encourage the dilated temporal graph reasoning module to find and correct wrong temporal relations in videos. Our DTGRM model outperforms state-of-the-art action segmentation models on three challenging datasets: 50Salads, Georgia Tech Egocentric Activities (GTEA), and the Breakfast dataset. The code is available at https://github.com/redwang/DTGRM.

Results

TaskDatasetMetricValueModel
Action Localization50 SaladsAcc80DTGRM
Action Localization50 SaladsEdit72DTGRM
Action Localization50 SaladsF1@10%79.1DTGRM
Action Localization50 SaladsF1@25%75.9DTGRM
Action Localization50 SaladsF1@50%66.1DTGRM
Action LocalizationBreakfastAcc68.3DTGRM
Action LocalizationBreakfastAverage F159.1DTGRM
Action LocalizationBreakfastEdit68.9DTGRM
Action LocalizationBreakfastF1@10%68.7DTGRM
Action LocalizationBreakfastF1@25%61.9DTGRM
Action LocalizationBreakfastF1@50%46.6DTGRM
Action Segmentation50 SaladsAcc80DTGRM
Action Segmentation50 SaladsEdit72DTGRM
Action Segmentation50 SaladsF1@10%79.1DTGRM
Action Segmentation50 SaladsF1@25%75.9DTGRM
Action Segmentation50 SaladsF1@50%66.1DTGRM
Action SegmentationBreakfastAcc68.3DTGRM
Action SegmentationBreakfastAverage F159.1DTGRM
Action SegmentationBreakfastEdit68.9DTGRM
Action SegmentationBreakfastF1@10%68.7DTGRM
Action SegmentationBreakfastF1@25%61.9DTGRM
Action SegmentationBreakfastF1@50%46.6DTGRM

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17Self-supervised pretraining of vision transformers for animal behavioral analysis and neural encoding2025-07-13Zero-shot Skeleton-based Action Recognition with Prototype-guided Feature Alignment2025-07-01EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception2025-06-26LLaVA-Pose: Enhancing Human Pose and Action Understanding via Keypoint-Integrated Instruction Tuning2025-06-26Feature Hallucination for Self-supervised Action Recognition2025-06-25CARMA: Context-Aware Situational Grounding of Human-Robot Group Interactions by Combining Vision-Language Models with Object and Action Recognition2025-06-25Including Semantic Information via Word Embeddings for Skeleton-based Action Recognition2025-06-23