TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Focusing and Diffusion: Bidirectional Attentive Graph Conv...

Focusing and Diffusion: Bidirectional Attentive Graph Convolutional Networks for Skeleton-based Action Recognition

Jialin Gao, Tong He, Xi Zhou, Shiming Ge

2019-12-24Skeleton Based Action RecognitionAction Recognition
PaperPDF

Abstract

A collection of approaches based on graph convolutional networks have proven success in skeleton-based action recognition by exploring neighborhood information and dense dependencies between intra-frame joints. However, these approaches usually ignore the spatial-temporal global context as well as the local relation between inter-frame and intra-frame. In this paper, we propose a focusing and diffusion mechanism to enhance graph convolutional networks by paying attention to the kinematic dependence of articulated human pose in a frame and their implicit dependencies over frames. In the focusing process, we introduce an attention module to learn a latent node over the intra-frame joints to convey spatial contextual information. In this way, the sparse connections between joints in a frame can be well captured, while the global context over the entire sequence is further captured by these hidden nodes with a bidirectional LSTM. In the diffusing process, the learned spatial-temporal contextual information is passed back to the spatial joints, leading to a bidirectional attentive graph convolutional network (BAGCN) that can facilitate skeleton-based action recognition. Extensive experiments on the challenging NTU RGB+D and Skeleton-Kinetics benchmarks demonstrate the efficacy of our approach.

Results

TaskDatasetMetricValueModel
VideoNTU RGB+DAccuracy (CS)90.3BAGCN
VideoNTU RGB+DAccuracy (CV)96.3BAGCN
Temporal Action LocalizationNTU RGB+DAccuracy (CS)90.3BAGCN
Temporal Action LocalizationNTU RGB+DAccuracy (CV)96.3BAGCN
Zero-Shot LearningNTU RGB+DAccuracy (CS)90.3BAGCN
Zero-Shot LearningNTU RGB+DAccuracy (CV)96.3BAGCN
Activity RecognitionNTU RGB+DAccuracy (CS)90.3BAGCN
Activity RecognitionNTU RGB+DAccuracy (CV)96.3BAGCN
Action LocalizationNTU RGB+DAccuracy (CS)90.3BAGCN
Action LocalizationNTU RGB+DAccuracy (CV)96.3BAGCN
Action DetectionNTU RGB+DAccuracy (CS)90.3BAGCN
Action DetectionNTU RGB+DAccuracy (CV)96.3BAGCN
3D Action RecognitionNTU RGB+DAccuracy (CS)90.3BAGCN
3D Action RecognitionNTU RGB+DAccuracy (CV)96.3BAGCN
Action RecognitionNTU RGB+DAccuracy (CS)90.3BAGCN
Action RecognitionNTU RGB+DAccuracy (CV)96.3BAGCN

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17Zero-shot Skeleton-based Action Recognition with Prototype-guided Feature Alignment2025-07-01EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception2025-06-26Feature Hallucination for Self-supervised Action Recognition2025-06-25CARMA: Context-Aware Situational Grounding of Human-Robot Group Interactions by Combining Vision-Language Models with Object and Action Recognition2025-06-25Including Semantic Information via Word Embeddings for Skeleton-based Action Recognition2025-06-23Adapting Vision-Language Models for Evaluating World Models2025-06-22Active Multimodal Distillation for Few-shot Action Recognition2025-06-16