Grounded Situation Recognition with Transformers

Junhyeong Cho, Youngseok Yoon, Hyeonjun Lee, Suha Kwak

2021-11-19Visual Grounding Image Classification Grounded Situation Recognition Scene Understanding Visual Reasoning Object Detection

Paper PDF Code(official)

Abstract

Grounded Situation Recognition (GSR) is the task that not only classifies a salient action (verb), but also predicts entities (nouns) associated with semantic roles and their locations in the given image. Inspired by the remarkable success of Transformers in vision tasks, we propose a GSR model based on a Transformer encoder-decoder architecture. The attention mechanism of our model enables accurate verb classification by capturing high-level semantic feature of an image effectively, and allows the model to flexibly deal with the complicated and image-dependent relations between entities for improved noun classification and localization. Our model is the first Transformer architecture for GSR, and achieves the state of the art in every evaluation metric on the SWiG benchmark. Our code is available at https://github.com/jhcho99/gsrtr .

Results

Task	Dataset	Metric	Value	Model
Situation Recognition	imSitu	Top-1 Verb	40.63	GSRTR
Situation Recognition	imSitu	Top-1 Verb & Value	32.15	GSRTR
Situation Recognition	imSitu	Top-5 Verbs	69.81	GSRTR
Situation Recognition	imSitu	Top-5 Verbs & Value	54.13	GSRTR
Situation Recognition	SWiG	Top-1 Verb	40.63	GSRTR
Situation Recognition	SWiG	Top-1 Verb & Grounded-Value	25.49	GSRTR
Situation Recognition	SWiG	Top-1 Verb & Value	32.15	GSRTR
Situation Recognition	SWiG	Top-5 Verbs	69.81	GSRTR
Situation Recognition	SWiG	Top-5 Verbs & Grounded-Value	42.5	GSRTR
Situation Recognition	SWiG	Top-5 Verbs & Value	54.13	GSRTR
Grounded Situation Recognition	SWiG	Top-1 Verb	40.63	GSRTR
Grounded Situation Recognition	SWiG	Top-1 Verb & Grounded-Value	25.49	GSRTR
Grounded Situation Recognition	SWiG	Top-1 Verb & Value	32.15	GSRTR
Grounded Situation Recognition	SWiG	Top-5 Verbs	69.81	GSRTR
Grounded Situation Recognition	SWiG	Top-5 Verbs & Grounded-Value	42.5	GSRTR
Grounded Situation Recognition	SWiG	Top-5 Verbs & Value	54.13	GSRTR

Grounded Situation Recognition with Transformers

Abstract

Results

Related Papers

Grounded Situation Recognition with Transformers

Abstract

Results

Related Papers