TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Perceiving 3D Human-Object Spatial Arrangements from a Sin...

Perceiving 3D Human-Object Spatial Arrangements from a Single Image in the Wild

Jason Y. Zhang, Sam Pepose, Hanbyul Joo, Deva Ramanan, Jitendra Malik, Angjoo Kanazawa

2020-07-30ECCV 2020 83D Human Pose Estimation3D Shape ReconstructionCommon Sense ReasoningHuman-Object Interaction Detection3D Object Reconstruction3D Human Reconstruction3D Shape Reconstruction From A Single 2D Image
PaperPDFCodeCode(official)

Abstract

We present a method that infers spatial arrangements and shapes of humans and objects in a globally consistent 3D scene, all from a single image in-the-wild captured in an uncontrolled environment. Notably, our method runs on datasets without any scene- or object-level 3D supervision. Our key insight is that considering humans and objects jointly gives rise to "3D common sense" constraints that can be used to resolve ambiguity. In particular, we introduce a scale loss that learns the distribution of object size from data; an occlusion-aware silhouette re-projection loss to optimize object pose; and a human-object interaction loss to capture the spatial layout of objects with which humans interact. We empirically validate that our constraints dramatically reduce the space of likely 3D spatial configurations. We demonstrate our approach on challenging, in-the-wild images of humans interacting with large objects (such as bicycles, motorcycles, and surfboards) and handheld objects (such as laptops, tennis rackets, and skateboards). We quantify the ability of our approach to recover human-object arrangements and outline remaining challenges in this relatively domain. The project webpage can be found at https://jasonyzhang.com/phosa.

Results

TaskDatasetMetricValueModel
ReconstructionBEHAVEChamfer Distance12.17PHOSA
Object ReconstructionBEHAVEChamfer Distance26.62PHOSA
3D Object ReconstructionBEHAVEChamfer Distance26.62PHOSA

Related Papers

Comparing Apples to Oranges: A Dataset & Analysis of LLM Humour Understanding from Traditional Puns to Topical Jokes2025-07-17RoHOI: Robustness Benchmark for Human-Object Interaction Detection2025-07-12Bilateral Collaboration with Large Vision-Language Models for Open Vocabulary Human-Object Interaction Detection2025-07-09LoSiA: Efficient High-Rank Fine-Tuning via Subnet Localization and Optimization2025-07-06VolumetricSMPL: A Neural Volumetric Body Model for Efficient Interactions, Contacts, and Collisions2025-06-29Systematic Comparison of Projection Methods for Monocular 3D Human Pose Estimation on Fisheye Images2025-06-24HOIverse: A Synthetic Scene Graph Dataset With Human Object Interactions2025-06-24On the Robustness of Human-Object Interaction Detection against Distribution Shift2025-06-22