Metric: R@10 (higher is better)
| # | Model↕ | R@10▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | PaCE | 49.6 | No | PaCE: Unified Multi-modal Dialogue Pre-training ... | 2023-05-24 | Code |
| 2 | VLMo | 39.4 | No | VLMo: Unified Vision-Language Pre-Training with ... | 2021-11-03 | Code |
| 3 | SCAN | 37.1 | No | Stacked Cross Attention for Image-Text Matching | 2018-03-21 | Code |
| 4 | DE++ | 35.7 | No | PhotoChat: A Human-Human Dialogue Dataset with P... | 2021-07-06 | - |
| 5 | ViLT | 25.6 | No | ViLT: Vision-and-Language Transformer Without Co... | 2021-02-05 | Code |