Metric: AnswerExactMatch (Question Answering) (lower is better)
| # | Model↕ | AnswerExactMatch (Question Answering)▲ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | MCAN | 43.42 | No | Deep Modular Co-Attention Networks for Visual Qu... | 2019-06-25 | Code |
| 2 | ScanQA | 46.58 | No | SQA3D: Situated Question Answering in 3D Scenes | 2022-10-14 | Code |
| 3 | ScanQA (w/ auxiliary loss) | 47.2 | Yes | SQA3D: Situated Question Answering in 3D Scenes | 2022-10-14 | Code |
| 4 | LM4VisualEncoding | 48.09 | No | Frozen Transformers in Language Models Are Effec... | 2023-10-19 | Code |
| 5 | Lexicon3D | 50.7 | No | Lexicon3D: Probing Visual Foundation Models for ... | 2024-09-05 | Code |
| 6 | Situation3D | 52.6 | No | Situational Awareness Matters in 3D Vision Langu... | 2024-06-11 | Code |
| 7 | CREMA | 54.6 | No | CREMA: Generalizable and Efficient Video-Languag... | 2024-02-08 | Code |