TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Contextual Action Recognition with R*CNN

Contextual Action Recognition with R*CNN

Georgia Gkioxari, Ross Girshick, Jitendra Malik

2015-05-05ICCV 2015 12AttributeHuman-Object Interaction DetectionGeneral ClassificationAction RecognitionTemporal Action Localization
PaperPDFCode(official)Code

Abstract

There are multiple cues in an image which reveal what action a person is performing. For example, a jogger has a pose that is characteristic for jogging, but the scene (e.g. road, trail) and the presence of other joggers can be an additional source of information. In this work, we exploit the simple observation that actions are accompanied by contextual cues to build a strong action recognition system. We adapt RCNN to use more than one region for classification while still maintaining the ability to localize the action. We call our system R*CNN. The action-specific models and the feature maps are trained jointly, allowing for action specific representations to emerge. R*CNN achieves 90.2% mean AP on the PASAL VOC Action dataset, outperforming all other approaches in the field by a significant margin. Last, we show that R*CNN is not limited to action recognition. In particular, R*CNN can also be used to tackle fine-grained tasks such as attribute classification. We validate this claim by reporting state-of-the-art performance on the Berkeley Attributes of People dataset.

Results

TaskDatasetMetricValueModel
Human-Object Interaction DetectionHICOmAP28.5R*CNN
Object DetectionHICO-DETMAP2.15R*CNN
Object DetectionCharadesMAP0.99R*CNN
3DHICO-DETMAP2.15R*CNN
3DCharadesMAP0.99R*CNN
2D ClassificationHICO-DETMAP2.15R*CNN
2D ClassificationCharadesMAP0.99R*CNN
2D Object DetectionHICO-DETMAP2.15R*CNN
2D Object DetectionCharadesMAP0.99R*CNN
16kHICO-DETMAP2.15R*CNN
16kCharadesMAP0.99R*CNN

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17MGFFD-VLM: Multi-Granularity Prompt Learning for Face Forgery Detection with VLM2025-07-16Non-Adaptive Adversarial Face Generation2025-07-16DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16Attributes Shape the Embedding Space of Face Recognition Models2025-07-15COLIBRI Fuzzy Model: Color Linguistic-Based Representation and Interpretation2025-07-15Ref-Long: Benchmarking the Long-context Referencing Capability of Long-context Language Models2025-07-13RoHOI: Robustness Benchmark for Human-Object Interaction Detection2025-07-12