TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Interaction Relational Network for Mutual Action Recognition

Interaction Relational Network for Mutual Action Recognition

Mauricio Perez, Jun Liu, Alex C. Kot

2019-10-11Relational ReasoningAction RecognitionHuman Interaction Recognition
PaperPDFCode(official)

Abstract

Person-person mutual action recognition (also referred to as interaction recognition) is an important research branch of human activity analysis. Current solutions in the field -- mainly dominated by CNNs, GCNs and LSTMs -- often consist of complicated architectures and mechanisms to embed the relationships between the two persons on the architecture itself, to ensure the interaction patterns can be properly learned. Our main contribution with this work is by proposing a simpler yet very powerful architecture, named Interaction Relational Network, which utilizes minimal prior knowledge about the structure of the human body. We drive the network to identify by itself how to relate the body parts from the individuals interacting. In order to better represent the interaction, we define two different relationships, leading to specialized architectures and models for each. These multiple relationship models will then be fused into a single and special architecture, in order to leverage both streams of information for further enhancing the relational reasoning capability. Furthermore we define important structured pair-wise operations to extract meaningful extra information from each pair of joints -- distance and motion. Ultimately, with the coupling of an LSTM, our IRN is capable of paramount sequential relational reasoning. These important extensions we made to our network can also be valuable to other problems that require sophisticated relational reasoning. Our solution is able to achieve state-of-the-art performance on the traditional interaction recognition datasets SBU and UT, and also on the mutual actions from the large-scale dataset NTU RGB+D. Furthermore, it obtains competitive performance in the NTU RGB+D 120 dataset interactions subset.

Results

TaskDatasetMetricValueModel
Human Interaction RecognitionNTU RGB+DAccuracy (Cross-Subject)90.5LSTM-IRN'fc1inter+intra
Human Interaction RecognitionNTU RGB+DAccuracy (Cross-View)93.5LSTM-IRN'fc1inter+intra
Human Interaction RecognitionSBUAccuracy98.2LSTM-IRN'fc1inter+intra
Human Interaction RecognitionNTU RGB+D 120Accuracy (Cross-Setup)79.6LSTM-IRN
Human Interaction RecognitionNTU RGB+D 120Accuracy (Cross-Subject)77.7LSTM-IRN
Human Interaction RecognitionUT-InteractionAccuracy (Set 1)98.3LSTM-IRN'fc1inter+intra
Human Interaction RecognitionUT-InteractionAccuracy (Set 2)96.7LSTM-IRN'fc1inter+intra

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17Zero-shot Skeleton-based Action Recognition with Prototype-guided Feature Alignment2025-07-01EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception2025-06-26Feature Hallucination for Self-supervised Action Recognition2025-06-25CARMA: Context-Aware Situational Grounding of Human-Robot Group Interactions by Combining Vision-Language Models with Object and Action Recognition2025-06-25Including Semantic Information via Word Embeddings for Skeleton-based Action Recognition2025-06-23Adapting Vision-Language Models for Evaluating World Models2025-06-22FreeQ-Graph: Free-form Querying with Semantic Consistent Scene Graph for 3D Scene Understanding2025-06-16