Metric: Accuracy (higher is better)
| # | Model↕ | Accuracy▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | GPT4RoI | 81.6 | No | GPT4RoI: Instruction Tuning Large Language Model... | 2023-07-07 | Code |
| 2 | ERNIE-ViL-large(ensemble of 15 models) | 70.5 | No | ERNIE-ViL: Knowledge Enhanced Vision-Language Re... | 2020-06-30 | - |
| 3 | UNITER (Large) | 62.8 | No | UNITER: UNiversal Image-TExt Representation Lear... | 2019-09-25 | Code |
| 4 | KVL-BERTLARGE | 60.3 | No | KVL-BERT: Knowledge Enhanced Visual-and-Linguist... | 2020-12-13 | - |
| 5 | VL-BERTLARGE | 59.7 | No | VL-BERT: Pre-training of Generic Visual-Linguist... | 2019-08-22 | Code |
| 6 | VL-T5 | 58.9 | No | Unifying Vision-and-Language Tasks via Text Gene... | 2021-02-04 | Code |
| 7 | VisualBERT | 52.4 | No | VisualBERT: A Simple and Performant Baseline for... | 2019-08-09 | Code |