TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/SSML

SSML

Reported on 20 benchmarks across 5 tasks · 1 paper · 9 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Computer Vision17 results

  • VideoonMSVD
    text-to-video R@1· 2020-03-06
    20.3
    best: 61.4 (InternVideo2-6B)
    SOTA
    Noise Estimation Using Density Estimation for Self-Supervised Multimodal LearningarXiv:2003.03186
  • Video RetrievalonMSVD
    text-to-video R@1· 2020-03-06
    20.3
    best: 61.4 (InternVideo2-6B)
    SOTA
    Noise Estimation Using Density Estimation for Self-Supervised Multimodal LearningarXiv:2003.03186
  • Zero-Shot Video RetrievalonMSVD
    text-to-video R@1· 2020-03-06
    13.66
    best: 59.3 (InternVideo2-6B)
    SOTA
    Noise Estimation Using Density Estimation for Self-Supervised Multimodal LearningarXiv:2003.03186
  • Zero-Shot Video RetrievalonMSVD
    text-to-video R@10· 2020-03-06
    47.74
    best: 89.6 (InternVideo2-6B)
    SOTA
    Noise Estimation Using Density Estimation for Self-Supervised Multimodal LearningarXiv:2003.03186
  • Zero-Shot Video RetrievalonMSVD
    text-to-video R@5· 2020-03-06
    35.7
    best: 84.4 (InternVideo2-6B)
    SOTA
    Noise Estimation Using Density Estimation for Self-Supervised Multimodal LearningarXiv:2003.03186
  • Zero-Shot Video RetrievalonLSMDC
    text-to-video R@1· 2020-03-06
    4.2
    best: 33.8 (InternVideo2-6B)
    SOTA
    Noise Estimation Using Density Estimation for Self-Supervised Multimodal LearningarXiv:2003.03186
  • Zero-Shot Video RetrievalonLSMDC
    text-to-video R@10· 2020-03-06
    17.1
    best: 62.2 (InternVideo2-6B)
    SOTA
    Noise Estimation Using Density Estimation for Self-Supervised Multimodal LearningarXiv:2003.03186
  • Zero-Shot Video RetrievalonLSMDC
    text-to-video R@5· 2020-03-06
    11.6
    best: 55.9 (InternVideo2-6B)
    SOTA
    Noise Estimation Using Density Estimation for Self-Supervised Multimodal LearningarXiv:2003.03186
  • VideoonMSVD
    text-to-video Median Rank· 2020-03-06
    6
    Noise Estimation Using Density Estimation for Self-Supervised Multimodal LearningarXiv:2003.03186
  • VideoonMSVD
    text-to-video R@10· 2020-03-06
    63.3
    best: 90.3 (HunYuan_tvr (huge))
    Noise Estimation Using Density Estimation for Self-Supervised Multimodal LearningarXiv:2003.03186
  • VideoonMSVD
    text-to-video R@5· 2020-03-06
    49
    best: 87.6 (CAMoE)
    Noise Estimation Using Density Estimation for Self-Supervised Multimodal LearningarXiv:2003.03186
  • Video RetrievalonMSVD
    text-to-video Median Rank· 2020-03-06
    6
    Noise Estimation Using Density Estimation for Self-Supervised Multimodal LearningarXiv:2003.03186
  • Video RetrievalonMSVD
    text-to-video R@10· 2020-03-06
    63.3
    best: 90.3 (HunYuan_tvr (huge))
    Noise Estimation Using Density Estimation for Self-Supervised Multimodal LearningarXiv:2003.03186
  • Video RetrievalonMSVD
    text-to-video R@5· 2020-03-06
    49
    best: 87.6 (CAMoE)
    Noise Estimation Using Density Estimation for Self-Supervised Multimodal LearningarXiv:2003.03186
  • Zero-Shot Video RetrievalonMSR-VTT
    text-to-video R@1· 2020-03-06
    8
    best: 55.9 (InternVideo2-6B)
    Noise Estimation Using Density Estimation for Self-Supervised Multimodal LearningarXiv:2003.03186
  • Zero-Shot Video RetrievalonMSR-VTT
    text-to-video R@10· 2020-03-06
    29.3
    best: 85.1 (InternVideo2-6B)
    Noise Estimation Using Density Estimation for Self-Supervised Multimodal LearningarXiv:2003.03186
  • Zero-Shot Video RetrievalonMSR-VTT
    text-to-video R@5· 2020-03-06
    21.3
    best: 78.3 (InternVideo2-6B)
    Noise Estimation Using Density Estimation for Self-Supervised Multimodal LearningarXiv:2003.03186

Natural Language Processing3 results

  • Visual Question AnsweringonMSRVTT-QA
    Accuracy· 2020-03-06
    0.35
    best: 0.47 (FrozenBiLM)
    SOTA
    Noise Estimation Using Density Estimation for Self-Supervised Multimodal LearningarXiv:2003.03186
  • Visual Question Answering (VQA)onMSVD-QA
    Accuracy· 2020-03-06
    0.351
    best: 0.61 (VLAB)
    Noise Estimation Using Density Estimation for Self-Supervised Multimodal LearningarXiv:2003.03186
  • Visual Question Answering (VQA)onMSRVTT-QA
    Accuracy· 2020-03-06
    0.35
    best: 0.496 (VLAB)
    Noise Estimation Using Density Estimation for Self-Supervised Multimodal LearningarXiv:2003.03186