Metric: Percentage correct (higher is better)
| # | Model↕ | Percentage correct▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | MCB 7 att. | 70.1 | No | Multimodal Compact Bilinear Pooling for Visual Q... | 2016-06-06 | Code |
| 2 | Dual-MFA | 70.04 | No | Co-attending Free-form Regions and Detections wi... | 2017-11-18 | Code |
| 3 | RelAtt | 69.6 | No | R-VQA: Learning Visual Relation Facts with Seman... | 2018-05-24 | Code |
| 4 | 3-Modalities: Unary + Pairwise + Ternary (ResNet) | 69.3 | No | High-Order Attention Models for Visual Question ... | 2017-11-12 | Code |
| 5 | joint-loss | 67.3 | No | Training Recurrent Answering Units with Joint Lo... | 2016-06-12 | - |
| 6 | MRN | 66.3 | No | Multimodal Residual Learning for Visual QA | 2016-06-05 | Code |
| 7 | HQI+ResNet | 66.1 | No | Hierarchical Question-Image Co-Attention for Vis... | 2016-05-31 | Code |
| 8 | FDA | 64.2 | No | A Focused Dynamic Attention Model for Visual Que... | 2016-04-06 | - |
| 9 | LSTM Q+I | 63.1 | No | VQA: Visual Question Answering | 2015-05-03 | Code |
| 10 | iBOWIMG baseline | 62 | No | Simple Baseline for Visual Question Answering | 2015-12-07 | Code |