Metric: Accuracy (higher is better)
| # | Model↕ | Accuracy▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | GPT4RoI | 91 | No | GPT4RoI: Instruction Tuning Large Language Model... | 2023-07-07 | Code |
| 2 | ERNIE-ViL-large(ensemble of 15 models) | 86.1 | No | ERNIE-ViL: Knowledge Enhanced Vision-Language Re... | 2020-06-30 | - |
| 3 | UNITER-large (ensemble of 10 models) | 83.4 | No | UNITER: UNiversal Image-TExt Representation Lear... | 2019-09-25 | Code |
| 4 | UNITER (Large) | 80.8 | No | UNITER: UNiversal Image-TExt Representation Lear... | 2019-09-25 | Code |
| 5 | KVL-BERTLARGE | 78.6 | No | KVL-BERT: Knowledge Enhanced Visual-and-Linguist... | 2020-12-13 | - |
| 6 | VL-BERTLARGE | 78.4 | No | VL-BERT: Pre-training of Generic Visual-Linguist... | 2019-08-22 | Code |
| 7 | VL-T5 | 77.8 | No | Unifying Vision-and-Language Tasks via Text Gene... | 2021-02-04 | Code |
| 8 | VisualBERT | 73.2 | No | VisualBERT: A Simple and Performant Baseline for... | 2019-08-09 | Code |