TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Efficient Two-Stage Detection of Human-Object Interactions...

Efficient Two-Stage Detection of Human-Object Interactions with a Novel Unary-Pairwise Transformer

Frederic Z. Zhang, Dylan Campbell, Stephen Gould

2021-12-03CVPR 2022 1Human-Object Interaction Detection
PaperPDFCode(official)

Abstract

Recent developments in transformer models for visual data have led to significant improvements in recognition and detection tasks. In particular, using learnable queries in place of region proposals has given rise to a new class of one-stage detection models, spearheaded by the Detection Transformer (DETR). Variations on this one-stage approach have since dominated human-object interaction (HOI) detection. However, the success of such one-stage HOI detectors can largely be attributed to the representation power of transformers. We discovered that when equipped with the same transformer, their two-stage counterparts can be more performant and memory-efficient, while taking a fraction of the time to train. In this work, we propose the Unary-Pairwise Transformer, a two-stage detector that exploits unary and pairwise representations for HOIs. We observe that the unary and pairwise parts of our transformer network specialise, with the former preferentially increasing the scores of positive examples and the latter decreasing the scores of negative examples. We evaluate our method on the HICO-DET and V-COCO datasets, and significantly outperform state-of-the-art approaches. At inference time, our model with ResNet50 approaches real-time performance on a single GPU.

Results

TaskDatasetMetricValueModel
Human-Object Interaction DetectionV-COCOAP(S1)61.3UPT-R101-DC5
Human-Object Interaction DetectionV-COCOAP(S2)67.1UPT-R101-DC5
Human-Object Interaction DetectionV-COCOTime Per Frame(ms)131UPT-R101-DC5
Human-Object Interaction DetectionV-COCOAP(S1)60.7UPT-R101
Human-Object Interaction DetectionV-COCOAP(S2)66.2UPT-R101
Human-Object Interaction DetectionV-COCOTime Per Frame(ms)64UPT-R101
Human-Object Interaction DetectionV-COCOAP(S1)59UPT-R50
Human-Object Interaction DetectionV-COCOAP(S2)64.5UPT-R50
Human-Object Interaction DetectionV-COCOTime Per Frame(ms)43UPT-R50
Human-Object Interaction DetectionHICO-DETTime Per Frame (ms)124UPT-R101-DC5
Human-Object Interaction DetectionHICO-DETmAP32.62UPT-R101-DC5
Human-Object Interaction DetectionHICO-DETTime Per Frame (ms)61UPT-R101
Human-Object Interaction DetectionHICO-DETmAP32.31UPT-R101
Human-Object Interaction DetectionHICO-DETTime Per Frame (ms)42UPT-R50
Human-Object Interaction DetectionHICO-DETmAP31.66UPT-R50

Related Papers

RoHOI: Robustness Benchmark for Human-Object Interaction Detection2025-07-12Bilateral Collaboration with Large Vision-Language Models for Open Vocabulary Human-Object Interaction Detection2025-07-09VolumetricSMPL: A Neural Volumetric Body Model for Efficient Interactions, Contacts, and Collisions2025-06-29HOIverse: A Synthetic Scene Graph Dataset With Human Object Interactions2025-06-24On the Robustness of Human-Object Interaction Detection against Distribution Shift2025-06-22Egocentric Human-Object Interaction Detection: A New Benchmark and Method2025-06-17InterActHuman: Multi-Concept Human Animation with Layout-Aligned Audio Conditions2025-06-11HunyuanVideo-HOMA: Generic Human-Object Interaction in Multimodal Driven Human Animation2025-06-10