Video-to-image Affordance Grounding
8 benchmarks4 papers
Given a demonstration video V and a target image I, the goal of video-to-image affordance grounding predict an affordance heatmap over the target image according to the hand-interacted region in the video, accompanied by the affordance action (e.g., press, turn).