Grounded Human-Object Interaction Hotspots from Video

Tushar Nagarajan, Christoph Feichtenhofer, Kristen Grauman

2018-12-11ICCV 2019 10Human-Object Interaction Detection Object Recognition Semantic Segmentation Video-to-image Affordance Grounding

Paper PDF Code

Abstract

Learning how to interact with objects is an important step towards embodied visual intelligence, but existing techniques suffer from heavy supervision or sensing requirements. We propose an approach to learn human-object interaction "hotspots" directly from video. Rather than treat affordances as a manually supervised semantic segmentation task, our approach learns about interactions by watching videos of real human behavior and anticipating afforded actions. Given a novel image or video, our model infers a spatial hotspot map indicating how an object would be manipulated in a potential interaction-- even if the object is currently at rest. Through results with both first and third person video, we show the value of grounding affordances in real human-object interactions. Not only are our weakly supervised hotspots competitive with strongly supervised affordance methods, but they can also anticipate object interaction for novel object categories.

Results

Task	Dataset	Metric	Value	Model
Video-to-image Affordance Grounding	OPRA (28x28)	AUC-J	0.81	Hotspot
Video-to-image Affordance Grounding	OPRA (28x28)	KLD	1.47	Hotspot
Video-to-image Affordance Grounding	OPRA (28x28)	SIM	0.36	Hotspot
Video-to-image Affordance Grounding	EPIC-Hotspot	AUC-J	0.79	Hotspot
Video-to-image Affordance Grounding	EPIC-Hotspot	KLD	1.26	Hotspot
Video-to-image Affordance Grounding	EPIC-Hotspot	SIM	0.4	Hotspot

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21 DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17 SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17 Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17 A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17 SAMST: A Transformer framework based on SAM pseudo label filtering for remote sensing semi-supervised semantic segmentation2025-07-16 Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping2025-07-15 U-RWKV: Lightweight medical image segmentation with direction-adaptive RWKV2025-07-15