Metric: Sum(R@1,5,10) (higher is better)
| # | Model↕ | Sum(R@1,5,10)▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | PaCE | 101.5 | No | PaCE: Unified Multi-modal Dialogue Pre-training ... | 2023-05-24 | Code |
| 2 | VLMo | 83.2 | No | VLMo: Unified Vision-Language Pre-Training with ... | 2021-11-03 | Code |
| 3 | SCAN | 74.5 | No | Stacked Cross Attention for Image-Text Matching | 2018-03-21 | Code |
| 4 | DE++ | 71.1 | No | PhotoChat: A Human-Human Dialogue Dataset with P... | 2021-07-06 | - |
| 5 | ViLT | 71 | No | ViLT: Vision-and-Language Transformer Without Co... | 2021-02-05 | Code |