TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Hu...

Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object Interactions

Huaizu Jiang, Xiaojian Ma, Weili Nie, Zhiding Yu, Yuke Zhu, Song-Chun Zhu, Anima Anandkumar

2022-05-27CVPR 2022 1Few-Shot LearningBenchmarkingRepresentation LearningHuman-Object Interaction DetectionFew-Shot Image ClassificationVisual Reasoning
PaperPDFCode(official)

Abstract

A significant gap remains between today's visual pattern recognition models and human-level visual cognition especially when it comes to few-shot learning and compositional reasoning of novel concepts. We introduce Bongard-HOI, a new visual reasoning benchmark that focuses on compositional learning of human-object interactions (HOIs) from natural images. It is inspired by two desirable characteristics from the classical Bongard problems (BPs): 1) few-shot concept learning, and 2) context-dependent reasoning. We carefully curate the few-shot instances with hard negatives, where positive and negative images only disagree on action labels, making mere recognition of object categories insufficient to complete our benchmarks. We also design multiple test sets to systematically study the generalization of visual learning models, where we vary the overlap of the HOI concepts between the training and test sets of few-shot instances, from partial to no overlaps. Bongard-HOI presents a substantial challenge to today's visual recognition models. The state-of-the-art HOI detection model achieves only 62% accuracy on few-shot binary prediction while even amateur human testers on MTurk have 91% accuracy. With the Bongard-HOI benchmark, we hope to further advance research efforts in visual reasoning, especially in holistic perception-reasoning systems and better representation learning.

Results

TaskDatasetMetricValueModel
Image ClassificationBongard-HOIAvg. Accuracy91.42Human (Amateur)
Image ClassificationBongard-HOIAvg. Accuracy55.82Meta-Baseline (ImagNet_R50)
Image ClassificationBongard-HOIAvg. Accuracy54.3Meta-Baseline (MoCov2_R50)
Image ClassificationBongard-HOIAvg. Accuracy54.23Meta-Baseline (Scratch_R50)
Image ClassificationBongard-HOIAvg. Accuracy49.74ANIL (ImageNet_R50)
Few-Shot Image ClassificationBongard-HOIAvg. Accuracy91.42Human (Amateur)
Few-Shot Image ClassificationBongard-HOIAvg. Accuracy55.82Meta-Baseline (ImagNet_R50)
Few-Shot Image ClassificationBongard-HOIAvg. Accuracy54.3Meta-Baseline (MoCov2_R50)
Few-Shot Image ClassificationBongard-HOIAvg. Accuracy54.23Meta-Baseline (Scratch_R50)
Few-Shot Image ClassificationBongard-HOIAvg. Accuracy49.74ANIL (ImageNet_R50)

Related Papers

Visual Place Recognition for Large-Scale UAV Applications2025-07-20Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper2025-07-20GLAD: Generalizable Tuning for Vision-Language Models2025-07-17Training Transformers with Enforced Lipschitz Constants2025-07-17Disentangling coincident cell events using deep transfer learning and compressive sensing2025-07-17MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17Boosting Team Modeling through Tempo-Relational Representation Learning2025-07-17