Metric: R@10 (higher is better)
| # | Model↕ | R@10▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | CN-CLIP (ViT-L/14@336px) | 98.7 | No | Chinese CLIP: Contrastive Vision-Language Pretra... | 2022-11-02 | Code |
| 2 | CN-CLIP (ViT-H/14) | 98.6 | No | Chinese CLIP: Contrastive Vision-Language Pretra... | 2022-11-02 | Code |
| 3 | CN-CLIP (ViT-L/14) | 98.6 | No | Chinese CLIP: Contrastive Vision-Language Pretra... | 2022-11-02 | Code |
| 4 | R2D2 (ViT-L/14) | 98.4 | No | CCMB: A Large-scale Chinese Cross-modal Benchmark | 2022-05-08 | Code |
| 5 | CN-CLIP (ViT-B/16) | 97.4 | No | Chinese CLIP: Contrastive Vision-Language Pretra... | 2022-11-02 | Code |
| 6 | InternVL-G-FT | 97.1 | No | InternVL: Scaling up Vision Foundation Models an... | 2023-12-21 | Code |
| 7 | InternVL-C-FT | 97 | No | InternVL: Scaling up Vision Foundation Models an... | 2023-12-21 | Code |
| 8 | R2D2 (ViT-B) | 97 | No | CCMB: A Large-scale Chinese Cross-modal Benchmark | 2022-05-08 | Code |
| 9 | Wukong (ViT-L/14) | 97 | No | Wukong: A 100 Million Large-scale Chinese Cross-... | 2022-02-14 | Code |
| 10 | Wukong (ViT-B/32) | 94.2 | No | Wukong: A 100 Million Large-scale Chinese Cross-... | 2022-02-14 | Code |
| 11 | CN-CLIP (RN50) | 94.1 | No | Chinese CLIP: Contrastive Vision-Language Pretra... | 2022-11-02 | Code |