TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/VisualSparta

VisualSparta

Reported on 20 benchmarks across 4 tasks · 1 paper · 6 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Computer Vision11 results

  • Image RetrievalonFlickr30K 1K test
    R@1· 2021-01-01
    57.4
    best: 86.9 (X-VLM (base))
    SOTA
    VisualSparta: An Embarrassingly Simple Approach to Large-scale Text-to-Image Search with Weighted Bag-of-wordsarXiv:2101.00265
  • Image RetrievalonFlickr30k
    QPS· 2021-01-01
    451.4
    SOTA
    VisualSparta: An Embarrassingly Simple Approach to Large-scale Text-to-Image Search with Weighted Bag-of-wordsarXiv:2101.00265
  • Image RetrievalonFlickr30k
    Recall@1· 2021-01-01
    57.4
    best: 89.7 (BLIP-2 ViT-G (zero-shot, 1K test set))
    SOTA
    VisualSparta: An Embarrassingly Simple Approach to Large-scale Text-to-Image Search with Weighted Bag-of-wordsarXiv:2101.00265
  • Image RetrievalonCOCO (Common Objects in Context)
    QPS· 2021-01-01
    451.4
    SOTA
    VisualSparta: An Embarrassingly Simple Approach to Large-scale Text-to-Image Search with Weighted Bag-of-wordsarXiv:2101.00265
  • Image RetrievalonCOCO (Common Objects in Context)
    recall@1· 2021-01-01
    68.2
    best: 68.3 (BLIP-2 ViT-G (fine-tuned))
    SOTA
    VisualSparta: An Embarrassingly Simple Approach to Large-scale Text-to-Image Search with Weighted Bag-of-wordsarXiv:2101.00265
  • Image RetrievalonCOCO (Common Objects in Context)
    recall@5· 2021-01-01
    91.8
    SOTA
    VisualSparta: An Embarrassingly Simple Approach to Large-scale Text-to-Image Search with Weighted Bag-of-wordsarXiv:2101.00265
  • Image RetrievalonFlickr30K 1K test
    R@10· 2021-01-01
    88.1
    best: 98.7 (X-VLM (base))
    VisualSparta: An Embarrassingly Simple Approach to Large-scale Text-to-Image Search with Weighted Bag-of-wordsarXiv:2101.00265
  • Image RetrievalonFlickr30K 1K test
    R@5· 2021-01-01
    82
    best: 97.3 (X-VLM (base))
    VisualSparta: An Embarrassingly Simple Approach to Large-scale Text-to-Image Search with Weighted Bag-of-wordsarXiv:2101.00265
  • Image RetrievalonFlickr30k
    Recall@10· 2021-01-01
    88.1
    best: 98.9 (BLIP-2 ViT-G (zero-shot, 1K test set))
    VisualSparta: An Embarrassingly Simple Approach to Large-scale Text-to-Image Search with Weighted Bag-of-wordsarXiv:2101.00265
  • Image RetrievalonFlickr30k
    Recall@5· 2021-01-01
    82
    best: 98.1 (BLIP-2 ViT-G (zero-shot, 1K test set))
    VisualSparta: An Embarrassingly Simple Approach to Large-scale Text-to-Image Search with Weighted Bag-of-wordsarXiv:2101.00265
  • Image RetrievalonCOCO (Common Objects in Context)
    Recall@10· 2021-01-01
    96.3
    best: 98.3 (Oscar)
    VisualSparta: An Embarrassingly Simple Approach to Large-scale Text-to-Image Search with Weighted Bag-of-wordsarXiv:2101.00265

Miscellaneous6 results

  • Image Retrieval with Multi-Modal QueryonCOCO 2014
    Text-to-image R@1· uses extra data· 2021-01-01
    44.4
    best: 68 (VAST)
    VisualSparta: An Embarrassingly Simple Approach to Large-scale Text-to-Image Search with Weighted Bag-of-wordsarXiv:2101.00265
  • Image Retrieval with Multi-Modal QueryonCOCO 2014
    Text-to-image R@10· uses extra data· 2021-01-01
    82.4
    best: 92.8 (VAST)
    VisualSparta: An Embarrassingly Simple Approach to Large-scale Text-to-Image Search with Weighted Bag-of-wordsarXiv:2101.00265
  • Image Retrieval with Multi-Modal QueryonCOCO 2014
    Text-to-image R@5· uses extra data· 2021-01-01
    72.8
    best: 92.8 (BEiT-3)
    VisualSparta: An Embarrassingly Simple Approach to Large-scale Text-to-Image Search with Weighted Bag-of-wordsarXiv:2101.00265
  • Cross-Modal Information RetrievalonCOCO 2014
    Text-to-image R@1· uses extra data· 2021-01-01
    44.4
    best: 68 (VAST)
    VisualSparta: An Embarrassingly Simple Approach to Large-scale Text-to-Image Search with Weighted Bag-of-wordsarXiv:2101.00265
  • Cross-Modal Information RetrievalonCOCO 2014
    Text-to-image R@10· uses extra data· 2021-01-01
    82.4
    best: 92.8 (VAST)
    VisualSparta: An Embarrassingly Simple Approach to Large-scale Text-to-Image Search with Weighted Bag-of-wordsarXiv:2101.00265
  • Cross-Modal Information RetrievalonCOCO 2014
    Text-to-image R@5· uses extra data· 2021-01-01
    72.8
    best: 92.8 (BEiT-3)
    VisualSparta: An Embarrassingly Simple Approach to Large-scale Text-to-Image Search with Weighted Bag-of-wordsarXiv:2101.00265

Natural Language Processing3 results

  • Cross-Modal RetrievalonCOCO 2014
    Text-to-image R@1· uses extra data· 2021-01-01
    44.4
    best: 68 (VAST)
    VisualSparta: An Embarrassingly Simple Approach to Large-scale Text-to-Image Search with Weighted Bag-of-wordsarXiv:2101.00265
  • Cross-Modal RetrievalonCOCO 2014
    Text-to-image R@10· uses extra data· 2021-01-01
    82.4
    best: 92.8 (VAST)
    VisualSparta: An Embarrassingly Simple Approach to Large-scale Text-to-Image Search with Weighted Bag-of-wordsarXiv:2101.00265
  • Cross-Modal RetrievalonCOCO 2014
    Text-to-image R@5· uses extra data· 2021-01-01
    72.8
    best: 92.8 (BEiT-3)
    VisualSparta: An Embarrassingly Simple Approach to Large-scale Text-to-Image Search with Weighted Bag-of-wordsarXiv:2101.00265