Metric: Top-1 Verb (higher is better)
| # | Model↕ | Top-1 Verb▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | Ours | 58.88 | No | Dynamic Scene Understanding from Vision-Language... | 2025-01-20 | - |
| 2 | ClipSitu | 47.23 | No | ClipSitu: Effectively Leveraging CLIP for Condit... | 2023-07-02 | Code |
| 3 | CoFormer | 44.66 | No | Collaborative Transformers for Grounded Situatio... | 2022-03-30 | Code |
| 4 | SituFormer | 44.2 | No | Rethinking the Two-Stage Framework for Grounded ... | 2021-12-10 | Code |
| 5 | Kernel GraphNet | 43.27 | No | - | - | - |
| 6 | GSRTR | 40.63 | No | Grounded Situation Recognition with Transformers | 2021-11-19 | Code |
| 7 | JSL | 39.94 | No | Grounded Situation Recognition | 2020-03-26 | Code |
| 8 | ISL | 39.36 | No | Grounded Situation Recognition | 2020-03-26 | Code |
| 9 | CAQ + RE-VGG | 38.19 | No | - | - | Code |
| 10 | GraphNet | 36.72 | No | Situation Recognition with Graph Neural Networks | 2017-08-14 | Code |
| 11 | RNN + Fusion | 35.9 | No | Recurrent Models for Situation Recognition | 2017-03-18 | - |
| 12 | CRF + Aug | 34.12 | No | Commonly Uncommon: Semantic Sparsity in Situatio... | 2016-12-03 | Code |
| 13 | CRF | 32.34 | No | - | - | Code |