Metric: Top-1 Verb & Grounded-Value (higher is better)
| # | Model↕ | Top-1 Verb & Grounded-Value▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | Ours (CoFormer+) | 41.28 | No | Dynamic Scene Understanding from Vision-Language... | 2025-01-20 | - |
| 2 | ClipSitu | 40.01 | No | ClipSitu: Effectively Leveraging CLIP for Condit... | 2023-07-02 | Code |
| 3 | SituFormer | 29.22 | No | Rethinking the Two-Stage Framework for Grounded ... | 2021-12-10 | Code |
| 4 | CoFormer | 29.05 | No | Collaborative Transformers for Grounded Situatio... | 2022-03-30 | Code |
| 5 | GSRTR | 25.49 | No | Grounded Situation Recognition with Transformers | 2021-11-19 | Code |
| 6 | JSL | 24.86 | No | Grounded Situation Recognition | 2020-03-26 | Code |
| 7 | ISL | 22.73 | No | Grounded Situation Recognition | 2020-03-26 | Code |