Metric: Text-to-image R@1 (higher is better)
| # | Model↕ | Text-to-image R@1▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | VLPCook (R1M+) | 75.6 | No | Vision and Structured-Language Pretraining for C... | 2022-12-08 | Code |
| 2 | VLPCook | 74.7 | No | Vision and Structured-Language Pretraining for C... | 2022-12-08 | Code |
| 3 | T-Food (CLIP) | 72.6 | No | Transformer Decoders with MultiModal Regularizat... | 2022-04-20 | Code |
| 4 | T-Food | 68.3 | No | Transformer Decoders with MultiModal Regularizat... | 2022-04-20 | Code |
| 5 | X-MRS | 63.9 | No | Cross-Modal Retrieval and Synthesis (X-MRS): Clo... | 2020-12-02 | - |
| 6 | H-T | 60.3 | No | Revamping Cross-Modal Recipe Retrieval with Hier... | 2021-03-24 | Code |
| 7 | SCAN | 54.9 | No | Cross-Modal Food Retrieval: Learning a Joint Emb... | 2020-03-09 | - |
| 8 | ACME | 52.8 | No | Learning Cross-Modal Embeddings with Adversarial... | 2019-05-03 | Code |
| 9 | AdaMine | 40.2 | No | Cross-Modal Retrieval in the Cooking Context: Le... | 2018-04-30 | Code |