IGFormer: Interaction Graph Transformer for Skeleton-based Human Interaction Recognition

Yunsheng Pang, Qiuhong Ke, Hossein Rahmani, James Bailey, Jun Liu

2022-07-25Human Interaction Recognition

Abstract

Human interaction recognition is very important in many applications. One crucial cue in recognizing an interaction is the interactive body parts. In this work, we propose a novel Interaction Graph Transformer (IGFormer) network for skeleton-based interaction recognition via modeling the interactive body parts as graphs. More specifically, the proposed IGFormer constructs interaction graphs according to the semantic and distance correlations between the interactive body parts, and enhances the representation of each person by aggregating the information of the interactive body parts based on the learned graphs. Furthermore, we propose a Semantic Partition Module to transform each human skeleton sequence into a Body-Part-Time sequence to better capture the spatial and temporal information of the skeleton sequence for learning the graphs. Extensive experiments on three benchmark datasets demonstrate that our model outperforms the state-of-the-art with a significant margin.

Results

Task	Dataset	Metric	Value	Model
Human Interaction Recognition	NTU RGB+D	Accuracy (Cross-Subject)	93.6	IGFormer
Human Interaction Recognition	NTU RGB+D	Accuracy (Cross-View)	96.5	IGFormer
Human Interaction Recognition	SBU / SBU-Refine	Accuracy	98.4	IGFormer
Human Interaction Recognition	NTU RGB+D 120	Accuracy (Cross-Setup)	86.5	IGFormer
Human Interaction Recognition	NTU RGB+D 120	Accuracy (Cross-Subject)	85.4	IGFormer

Related Papers

Dynamic Scene Understanding from Vision-Language Representations2025-01-20 OV-HHIR: Open Vocabulary Human Interaction Recognition Using Cross-modal Integration of Large Language Models2024-12-31 CHASE: Learning Convex Hull Adaptive Shift for Skeleton-based Multi-Entity Action Recognition2024-10-09 Empathic Grounding: Explorations using Multimodal Interaction and Large Language Models with Conversational Agents2024-07-01 Exploring Vision Transformers for 3D Human Motion-Language Models with Motion Patches2024-05-08 SkateFormer: Skeletal-Temporal Transformer for Human Action Recognition2024-03-14 Learning Mutual Excitation for Hand-to-Hand and Human-to-Human Interaction Recognition2024-02-04 A Two-stream Hybrid CNN-Transformer Network for Skeleton-based Human Interaction Recognition2023-12-31