Situation Recognition aims to produce the structured image summary which describes the primary activity (verb), and its relevant entities (nouns).