TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Semantic2Graph: Graph-based Multi-modal Feature Fusion for...

Semantic2Graph: Graph-based Multi-modal Feature Fusion for Action Segmentation in Videos

Junbin Zhang, Pei-Hsuan Tsai, Meng-Hsun Tsai

2022-09-13Action SegmentationNode Classification
PaperPDF

Abstract

Video action segmentation have been widely applied in many fields. Most previous studies employed video-based vision models for this purpose. However, they often rely on a large receptive field, LSTM or Transformer methods to capture long-term dependencies within videos, leading to significant computational resource requirements. To address this challenge, graph-based model was proposed. However, previous graph-based models are less accurate. Hence, this study introduces a graph-structured approach named Semantic2Graph, to model long-term dependencies in videos, thereby reducing computational costs and raise the accuracy. We construct a graph structure of video at the frame-level. Temporal edges are utilized to model the temporal relations and action order within videos. Additionally, we have designed positive and negative semantic edges, accompanied by corresponding edge weights, to capture both long-term and short-term semantic relationships in video actions. Node attributes encompass a rich set of multi-modal features extracted from video content, graph structures, and label text, encompassing visual, structural, and semantic cues. To synthesize this multi-modal information effectively, we employ a graph neural network (GNN) model to fuse multi-modal features for node action label classification. Experimental results demonstrate that Semantic2Graph outperforms state-of-the-art methods in terms of performance, particularly on benchmark datasets such as GTEA and 50Salads. Multiple ablation experiments further validate the effectiveness of semantic features in enhancing model performance. Notably, the inclusion of semantic edges in Semantic2Graph allows for the cost-effective capture of long-term dependencies, affirming its utility in addressing the challenges posed by computational resource constraints in video-based vision models.

Results

TaskDatasetMetricValueModel
Action Localization50 SaladsAcc88.6Semantic2Graph
Action Localization50 SaladsEdit89.1Semantic2Graph
Action Localization50 SaladsF1@10%91.5Semantic2Graph
Action Localization50 SaladsF1@25%90.2Semantic2Graph
Action Localization50 SaladsF1@50%87.3Semantic2Graph
Action LocalizationGTEAAcc89.8Semantic2Graph
Action LocalizationGTEAEdit92Semantic2Graph
Action LocalizationGTEAF1@10%95.7Semantic2Graph
Action LocalizationGTEAF1@25%94.2Semantic2Graph
Action LocalizationGTEAF1@50%91.3Semantic2Graph
Action Segmentation50 SaladsAcc88.6Semantic2Graph
Action Segmentation50 SaladsEdit89.1Semantic2Graph
Action Segmentation50 SaladsF1@10%91.5Semantic2Graph
Action Segmentation50 SaladsF1@25%90.2Semantic2Graph
Action Segmentation50 SaladsF1@50%87.3Semantic2Graph
Action SegmentationGTEAAcc89.8Semantic2Graph
Action SegmentationGTEAEdit92Semantic2Graph
Action SegmentationGTEAF1@10%95.7Semantic2Graph
Action SegmentationGTEAF1@25%94.2Semantic2Graph
Action SegmentationGTEAF1@50%91.3Semantic2Graph

Related Papers

Self-supervised pretraining of vision transformers for animal behavioral analysis and neural encoding2025-07-13Demystifying Distributed Training of Graph Neural Networks for Link Prediction2025-06-25Equivariance Everywhere All At Once: A Recipe for Graph Foundation Models2025-06-17Delving into Instance-Dependent Label Noise in Graph Data: A Comprehensive Study and Benchmark2025-06-14Graph Semi-Supervised Learning for Point Classification on Data Manifolds2025-06-13HopaDIFF: Holistic-Partial Aware Fourier Conditioned Diffusion for Referring Human Action Segmentation in Multi-Person Scenarios2025-06-11Devil's Hand: Data Poisoning Attacks to Locally Private Graph Learning Protocols2025-06-11Wasserstein Hypergraph Neural Network2025-06-11