Metric: Image-to-text R@1 (higher is better)
| # | Model↕ | Image-to-text R@1▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | VLPCook (R1M+) | 74.9 | No | Vision and Structured-Language Pretraining for C... | 2022-12-08 | Code |
| 2 | VLPCook | 73.6 | No | Vision and Structured-Language Pretraining for C... | 2022-12-08 | Code |
| 3 | T-Food (CLIP) | 72.3 | No | Transformer Decoders with MultiModal Regularizat... | 2022-04-20 | Code |
| 4 | T-Food | 68.2 | No | Transformer Decoders with MultiModal Regularizat... | 2022-04-20 | Code |
| 5 | X-MRS | 64 | No | Cross-Modal Retrieval and Synthesis (X-MRS): Clo... | 2020-12-02 | - |
| 6 | H-T | 60 | No | Revamping Cross-Modal Recipe Retrieval with Hier... | 2021-03-24 | Code |
| 7 | SCAN | 54 | No | Cross-Modal Food Retrieval: Learning a Joint Emb... | 2020-03-09 | - |
| 8 | ACME | 51.8 | No | Learning Cross-Modal Embeddings with Adversarial... | 2019-05-03 | Code |
| 9 | AdaMine | 39.8 | No | Cross-Modal Retrieval in the Cooking Context: Le... | 2018-04-30 | Code |