TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Recurrent Models for Situation Recognition

Recurrent Models for Situation Recognition

Arun Mallya, Svetlana Lazebnik

2017-03-18ICCV 2017 10Human-Object Interaction DetectionGrounded Situation RecognitionPredictionImage Captioning
PaperPDF

Abstract

This work proposes Recurrent Neural Network (RNN) models to predict structured 'image situations' -- actions and noun entities fulfilling semantic roles related to the action. In contrast to prior work relying on Conditional Random Fields (CRFs), we use a specialized action prediction network followed by an RNN for noun prediction. Our system obtains state-of-the-art accuracy on the challenging recent imSitu dataset, beating CRF-based models, including ones trained with additional data. Further, we show that specialized features learned from situation prediction can be transferred to the task of image captioning to more accurately describe human-object interactions.

Results

TaskDatasetMetricValueModel
Situation RecognitionimSituTop-1 Verb35.9RNN + Fusion
Situation RecognitionimSituTop-1 Verb & Value27.45RNN + Fusion
Situation RecognitionimSituTop-5 Verbs63.08RNN + Fusion
Situation RecognitionimSituTop-5 Verbs & Value46.88RNN + Fusion
Situation RecognitionSWiGTop-1 Verb35.9RNN + Fusion
Situation RecognitionSWiGTop-1 Verb & Value27.45RNN + Fusion
Situation RecognitionSWiGTop-5 Verbs63.08RNN + Fusion
Situation RecognitionSWiGTop-5 Verbs & Value46.88RNN + Fusion
Grounded Situation RecognitionSWiGTop-1 Verb35.9RNN + Fusion
Grounded Situation RecognitionSWiGTop-1 Verb & Value27.45RNN + Fusion
Grounded Situation RecognitionSWiGTop-5 Verbs63.08RNN + Fusion
Grounded Situation RecognitionSWiGTop-5 Verbs & Value46.88RNN + Fusion

Related Papers

Multi-Strategy Improved Snake Optimizer Accelerated CNN-LSTM-Attention-Adaboost for Trajectory Prediction2025-07-21Language-Guided Contrastive Audio-Visual Masked Autoencoder with Automatically Generated Audio-Visual-Text Triplets from Videos2025-07-16Generative Click-through Rate Prediction with Applications to Search Advertising2025-07-15RoHOI: Robustness Benchmark for Human-Object Interaction Detection2025-07-12Conformation-Aware Structure Prediction of Antigen-Recognizing Immune Proteins2025-07-11Bilateral Collaboration with Large Vision-Language Models for Open Vocabulary Human-Object Interaction Detection2025-07-09Foundation models for time series forecasting: Application in conformal prediction2025-07-09Predicting Graph Structure via Adapted Flux Balance Analysis2025-07-08