TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/HowToCaption

HowToCaption

Reported on 28 benchmarks across 2 tasks · 1 paper · 2 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Computer Vision28 results

  • Video CaptioningonYouCook2
    CIDEr· 2023-10-07
    116.4
    SOTA
    HowToCaption: Prompting LLMs to Transform Video Annotations at ScalearXiv:2310.04900
  • Zero-Shot Video RetrievalonYouCook2
    text-to-video Median Rank· 2023-10-07
    15
    SOTA
    HowToCaption: Prompting LLMs to Transform Video Annotations at ScalearXiv:2310.04900
  • Video CaptioningonMSR-VTT
    BLEU-4· 2023-10-07
    49.8
    best: 57.8 (mPLUG-2)
    HowToCaption: Prompting LLMs to Transform Video Annotations at ScalearXiv:2310.04900
  • Video CaptioningonMSR-VTT
    CIDEr· 2023-10-07
    65.3
    best: 80 (mPLUG-2)
    HowToCaption: Prompting LLMs to Transform Video Annotations at ScalearXiv:2310.04900
  • Video CaptioningonMSR-VTT
    METEOR· 2023-10-07
    32.2
    best: 38.7 (MV-GPT)
    HowToCaption: Prompting LLMs to Transform Video Annotations at ScalearXiv:2310.04900
  • Video CaptioningonMSR-VTT
    ROUGE-L· 2023-10-07
    66.3
    best: 70.1 (mPLUG-2)
    HowToCaption: Prompting LLMs to Transform Video Annotations at ScalearXiv:2310.04900
  • Video CaptioningonYouCook2
    BLEU-4· 2023-10-07
    8.8
    best: 18.2 (VAST)
    HowToCaption: Prompting LLMs to Transform Video Annotations at ScalearXiv:2310.04900
  • Video CaptioningonYouCook2
    METEOR· 2023-10-07
    15.9
    best: 22.56 (UniVL + MELTR)
    HowToCaption: Prompting LLMs to Transform Video Annotations at ScalearXiv:2310.04900
  • Video CaptioningonYouCook2
    ROUGE-L· 2023-10-07
    37.3
    best: 47.04 (UniVL + MELTR)
    HowToCaption: Prompting LLMs to Transform Video Annotations at ScalearXiv:2310.04900
  • Video CaptioningonMSVD
    BLEU-4· 2023-10-07
    70.4
    best: 80.7 (VALOR)
    HowToCaption: Prompting LLMs to Transform Video Annotations at ScalearXiv:2310.04900
  • Video CaptioningonMSVD
    CIDEr· 2023-10-07
    154.2
    best: 195.6 (MaMMUT)
    HowToCaption: Prompting LLMs to Transform Video Annotations at ScalearXiv:2310.04900
  • Video CaptioningonMSVD
    METEOR· 2023-10-07
    46.4
    best: 51.2 (VLAB)
    HowToCaption: Prompting LLMs to Transform Video Annotations at ScalearXiv:2310.04900
  • Video CaptioningonMSVD
    ROUGE-L· 2023-10-07
    83.2
    best: 87.9 (VLAB)
    HowToCaption: Prompting LLMs to Transform Video Annotations at ScalearXiv:2310.04900
  • Zero-Shot Video RetrievalonMSR-VTT
    text-to-video Median Rank· 2023-10-07
    3
    best: 66 (MMT)
    HowToCaption: Prompting LLMs to Transform Video Annotations at ScalearXiv:2310.04900
  • Zero-Shot Video RetrievalonMSR-VTT
    text-to-video R@1· 2023-10-07
    37.6
    best: 55.9 (InternVideo2-6B)
    HowToCaption: Prompting LLMs to Transform Video Annotations at ScalearXiv:2310.04900
  • Zero-Shot Video RetrievalonMSR-VTT
    text-to-video R@10· 2023-10-07
    73.3
    best: 85.1 (InternVideo2-6B)
    HowToCaption: Prompting LLMs to Transform Video Annotations at ScalearXiv:2310.04900
  • Zero-Shot Video RetrievalonMSR-VTT
    text-to-video R@5· 2023-10-07
    62
    best: 78.3 (InternVideo2-6B)
    HowToCaption: Prompting LLMs to Transform Video Annotations at ScalearXiv:2310.04900
  • Zero-Shot Video RetrievalonMSVD
    text-to-video Median Rank· 2023-10-07
    2
    HowToCaption: Prompting LLMs to Transform Video Annotations at ScalearXiv:2310.04900
  • Zero-Shot Video RetrievalonMSVD
    text-to-video R@1· 2023-10-07
    44.5
    best: 59.3 (InternVideo2-6B)
    HowToCaption: Prompting LLMs to Transform Video Annotations at ScalearXiv:2310.04900
  • Zero-Shot Video RetrievalonMSVD
    text-to-video R@10· 2023-10-07
    82.1
    best: 89.6 (InternVideo2-6B)
    HowToCaption: Prompting LLMs to Transform Video Annotations at ScalearXiv:2310.04900
  • Zero-Shot Video RetrievalonMSVD
    text-to-video R@5· 2023-10-07
    73.3
    best: 84.4 (InternVideo2-6B)
    HowToCaption: Prompting LLMs to Transform Video Annotations at ScalearXiv:2310.04900
  • Zero-Shot Video RetrievalonLSMDC
    text-to-video Median Rank· 2023-10-07
    29
    best: 50.7 (MILES)
    HowToCaption: Prompting LLMs to Transform Video Annotations at ScalearXiv:2310.04900
  • Zero-Shot Video RetrievalonLSMDC
    text-to-video R@1· 2023-10-07
    17.3
    best: 33.8 (InternVideo2-6B)
    HowToCaption: Prompting LLMs to Transform Video Annotations at ScalearXiv:2310.04900
  • Zero-Shot Video RetrievalonLSMDC
    text-to-video R@10· 2023-10-07
    38.6
    best: 62.2 (InternVideo2-6B)
    HowToCaption: Prompting LLMs to Transform Video Annotations at ScalearXiv:2310.04900
  • Zero-Shot Video RetrievalonLSMDC
    text-to-video R@5· 2023-10-07
    31.7
    best: 55.9 (InternVideo2-6B)
    HowToCaption: Prompting LLMs to Transform Video Annotations at ScalearXiv:2310.04900
  • Zero-Shot Video RetrievalonYouCook2
    text-to-video R@1· 2023-10-07
    13.4
    best: 26.1 (OmniVec2)
    HowToCaption: Prompting LLMs to Transform Video Annotations at ScalearXiv:2310.04900
  • Zero-Shot Video RetrievalonYouCook2
    text-to-video R@10· 2023-10-07
    44.1
    best: 70.8 (OmniVec2)
    HowToCaption: Prompting LLMs to Transform Video Annotations at ScalearXiv:2310.04900
  • Zero-Shot Video RetrievalonYouCook2
    text-to-video R@5· 2023-10-07
    33.1
    best: 54.1 (OmniVec2)
    HowToCaption: Prompting LLMs to Transform Video Annotations at ScalearXiv:2310.04900