TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Grounded Situation Recognition with Transformers

Grounded Situation Recognition with Transformers

Junhyeong Cho, Youngseok Yoon, Hyeonjun Lee, Suha Kwak

2021-11-19Visual GroundingImage ClassificationGrounded Situation RecognitionScene UnderstandingVisual ReasoningObject Detection
PaperPDFCode(official)

Abstract

Grounded Situation Recognition (GSR) is the task that not only classifies a salient action (verb), but also predicts entities (nouns) associated with semantic roles and their locations in the given image. Inspired by the remarkable success of Transformers in vision tasks, we propose a GSR model based on a Transformer encoder-decoder architecture. The attention mechanism of our model enables accurate verb classification by capturing high-level semantic feature of an image effectively, and allows the model to flexibly deal with the complicated and image-dependent relations between entities for improved noun classification and localization. Our model is the first Transformer architecture for GSR, and achieves the state of the art in every evaluation metric on the SWiG benchmark. Our code is available at https://github.com/jhcho99/gsrtr .

Results

TaskDatasetMetricValueModel
Situation RecognitionimSituTop-1 Verb40.63GSRTR
Situation RecognitionimSituTop-1 Verb & Value32.15GSRTR
Situation RecognitionimSituTop-5 Verbs69.81GSRTR
Situation RecognitionimSituTop-5 Verbs & Value54.13GSRTR
Situation RecognitionSWiGTop-1 Verb40.63GSRTR
Situation RecognitionSWiGTop-1 Verb & Grounded-Value25.49GSRTR
Situation RecognitionSWiGTop-1 Verb & Value32.15GSRTR
Situation RecognitionSWiGTop-5 Verbs69.81GSRTR
Situation RecognitionSWiGTop-5 Verbs & Grounded-Value42.5GSRTR
Situation RecognitionSWiGTop-5 Verbs & Value54.13GSRTR
Grounded Situation RecognitionSWiGTop-1 Verb40.63GSRTR
Grounded Situation RecognitionSWiGTop-1 Verb & Grounded-Value25.49GSRTR
Grounded Situation RecognitionSWiGTop-1 Verb & Value32.15GSRTR
Grounded Situation RecognitionSWiGTop-5 Verbs69.81GSRTR
Grounded Situation RecognitionSWiGTop-5 Verbs & Grounded-Value42.5GSRTR
Grounded Situation RecognitionSWiGTop-5 Verbs & Value54.13GSRTR

Related Papers

Automatic Classification and Segmentation of Tunnel Cracks Based on Deep Learning and Visual Explanations2025-07-18Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy2025-07-17Federated Learning for Commercial Image Sources2025-07-17MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17Advancing Complex Wide-Area Scene Understanding with Hierarchical Coresets Selection2025-07-17Argus: Leveraging Multiview Images for Improved 3-D Scene Understanding With Large Language Models2025-07-17City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17