Pose-based Modular Network for Human-Object Interaction Detection

Zhijun Liang, Junfa Liu, Yisheng Guan, Juan Rojas

2020-08-05Human-Object Interaction Detection Scene Understanding

Abstract

Human-object interaction(HOI) detection is a critical task in scene understanding. The goal is to infer the triplet <subject, predicate, object> in a scene. In this work, we note that the human pose itself as well as the relative spatial information of the human pose with respect to the target object can provide informative cues for HOI detection. We contribute a Pose-based Modular Network (PMN) which explores the absolute pose features and relative spatial pose features to improve HOI detection and is fully compatible with existing networks. Our module consists of a branch that first processes the relative spatial pose features of each joint independently. Another branch updates the absolute pose features via fully connected graph structures. The processed pose features are then fed into an action classifier. To evaluate our proposed method, we combine the module with the state-of-the-art model named VS-GATs and obtain significant improvement on two public benchmarks: V-COCO and HICO-DET, which shows its efficacy and flexibility. Code is available at \url{https://github.com/birlrobotics/PMN}.

Results

Task	Dataset	Metric	Value	Model
Human-Object Interaction Detection	V-COCO	AP(S1)	51.8	PMN
Human-Object Interaction Detection	HICO-DET	mAP	21.21	PMN

Related Papers

Advancing Complex Wide-Area Scene Understanding with Hierarchical Coresets Selection2025-07-17 Argus: Leveraging Multiview Images for Improved 3-D Scene Understanding With Large Language Models2025-07-17 City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17 Learning to Tune Like an Expert: Interpretable and Scene-Aware Navigation via MLLM Reasoning and CVAE-Based Adaptation2025-07-15 Tactical Decision for Multi-UGV Confrontation with a Vision-Language Model-Based Commander2025-07-15 Seeing the Signs: A Survey of Edge-Deployable OCR Models for Billboard Visibility Analysis2025-07-15 EmbRACE-3K: Embodied Reasoning and Action in Complex Environments2025-07-14 RoHOI: Robustness Benchmark for Human-Object Interaction Detection2025-07-12