TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/GIT2

GIT2

Reported on 37 benchmarks across 2 tasks · 1 paper · 27 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing32 results

  • Image Captioningonnocaps-XD in-domain
    B1· 2022-05-27
    88.86
    SOTA
    GIT: A Generative Image-to-text Transformer for Vision and LanguagearXiv:2205.14100
  • Image Captioningonnocaps-XD in-domain
    CIDEr· 2022-05-27
    124.18
    SOTA
    GIT: A Generative Image-to-text Transformer for Vision and LanguagearXiv:2205.14100
  • Image Captioningonnocaps-XD in-domain
    METEOR· 2022-05-27
    33.83
    SOTA
    GIT: A Generative Image-to-text Transformer for Vision and LanguagearXiv:2205.14100
  • Image Captioningonnocaps-XD in-domain
    SPICE· 2022-05-27
    16.36
    SOTA
    GIT: A Generative Image-to-text Transformer for Vision and LanguagearXiv:2205.14100
  • Image Captioningonnocaps-XD near-domain
    B1· 2022-05-27
    88.9
    SOTA
    GIT: A Generative Image-to-text Transformer for Vision and LanguagearXiv:2205.14100
  • Image Captioningonnocaps-XD near-domain
    B2· 2022-05-27
    75.86
    SOTA
    GIT: A Generative Image-to-text Transformer for Vision and LanguagearXiv:2205.14100
  • Image Captioningonnocaps-XD near-domain
    B3· 2022-05-27
    58.9
    SOTA
    GIT: A Generative Image-to-text Transformer for Vision and LanguagearXiv:2205.14100
  • Image Captioningonnocaps-XD near-domain
    B4· 2022-05-27
    38.95
    SOTA
    GIT: A Generative Image-to-text Transformer for Vision and LanguagearXiv:2205.14100
  • Image Captioningonnocaps-XD near-domain
    CIDEr· 2022-05-27
    125.51
    SOTA
    GIT: A Generative Image-to-text Transformer for Vision and LanguagearXiv:2205.14100
  • Image Captioningonnocaps-XD near-domain
    METEOR· 2022-05-27
    32.95
    SOTA
    GIT: A Generative Image-to-text Transformer for Vision and LanguagearXiv:2205.14100
  • Image Captioningonnocaps-XD near-domain
    ROUGE-L· 2022-05-27
    63.66
    SOTA
    GIT: A Generative Image-to-text Transformer for Vision and LanguagearXiv:2205.14100
  • Image Captioningonnocaps-XD near-domain
    SPICE· 2022-05-27
    16.11
    SOTA
    GIT: A Generative Image-to-text Transformer for Vision and LanguagearXiv:2205.14100
  • Image Captioningonnocaps-XD entire
    B1· 2022-05-27
    88.43
    SOTA
    GIT: A Generative Image-to-text Transformer for Vision and LanguagearXiv:2205.14100
  • Image Captioningonnocaps-XD entire
    B2· 2022-05-27
    75.02
    SOTA
    GIT: A Generative Image-to-text Transformer for Vision and LanguagearXiv:2205.14100
  • Image Captioningonnocaps-XD entire
    B3· 2022-05-27
    57.87
    SOTA
    GIT: A Generative Image-to-text Transformer for Vision and LanguagearXiv:2205.14100
  • Image Captioningonnocaps-XD entire
    B4· 2022-05-27
    37.65
    SOTA
    GIT: A Generative Image-to-text Transformer for Vision and LanguagearXiv:2205.14100
  • Image Captioningonnocaps-XD entire
    CIDEr· 2022-05-27
    124.77
    SOTA
    GIT: A Generative Image-to-text Transformer for Vision and LanguagearXiv:2205.14100
  • Image Captioningonnocaps-XD entire
    METEOR· 2022-05-27
    32.56
    SOTA
    GIT: A Generative Image-to-text Transformer for Vision and LanguagearXiv:2205.14100
  • Image Captioningonnocaps-XD entire
    ROUGE-L· 2022-05-27
    63.19
    SOTA
    GIT: A Generative Image-to-text Transformer for Vision and LanguagearXiv:2205.14100
  • Image Captioningonnocaps-XD entire
    SPICE· 2022-05-27
    16.06
    SOTA
    GIT: A Generative Image-to-text Transformer for Vision and LanguagearXiv:2205.14100
  • Image Captioningonnocaps-XD out-of-domain
    B1· 2022-05-27
    86.28
    SOTA
    GIT: A Generative Image-to-text Transformer for Vision and LanguagearXiv:2205.14100
  • Image Captioningonnocaps-XD out-of-domain
    B4· 2022-05-27
    30.15
    SOTA
    GIT: A Generative Image-to-text Transformer for Vision and LanguagearXiv:2205.14100
  • Image Captioningonnocaps-XD out-of-domain
    CIDEr· 2022-05-27
    122.27
    SOTA
    GIT: A Generative Image-to-text Transformer for Vision and LanguagearXiv:2205.14100
  • Image Captioningonnocaps-XD in-domain
    B2· 2022-05-27
    75.86
    best: 76.1 (GIT)
    GIT: A Generative Image-to-text Transformer for Vision and LanguagearXiv:2205.14100
  • Image Captioningonnocaps-XD in-domain
    B3· 2022-05-27
    59.94
    best: 60.53 (GIT)
    GIT: A Generative Image-to-text Transformer for Vision and LanguagearXiv:2205.14100
  • Image Captioningonnocaps-XD in-domain
    B4· 2022-05-27
    41.1
    best: 41.65 (GIT)
    GIT: A Generative Image-to-text Transformer for Vision and LanguagearXiv:2205.14100
  • Image Captioningonnocaps-XD in-domain
    ROUGE-L· 2022-05-27
    63.82
    best: 64.02 (GIT)
    GIT: A Generative Image-to-text Transformer for Vision and LanguagearXiv:2205.14100
  • Image Captioningonnocaps-XD out-of-domain
    B2· 2022-05-27
    71.15
    best: 71.28 (GIT)
    GIT: A Generative Image-to-text Transformer for Vision and LanguagearXiv:2205.14100
  • Image Captioningonnocaps-XD out-of-domain
    B3· 2022-05-27
    52.36
    best: 52.66 (GIT)
    GIT: A Generative Image-to-text Transformer for Vision and LanguagearXiv:2205.14100
  • Image Captioningonnocaps-XD out-of-domain
    METEOR· 2022-05-27
    30.15
    best: 30.45 (GIT)
    GIT: A Generative Image-to-text Transformer for Vision and LanguagearXiv:2205.14100
  • Image Captioningonnocaps-XD out-of-domain
    ROUGE-L· 2022-05-27
    60.91
    best: 60.96 (GIT)
    GIT: A Generative Image-to-text Transformer for Vision and LanguagearXiv:2205.14100
  • Image Captioningonnocaps-XD out-of-domain
    SPICE· 2022-05-27
    15.62
    best: 15.7 (GIT)
    GIT: A Generative Image-to-text Transformer for Vision and LanguagearXiv:2205.14100

Computer Vision5 results

  • Video CaptioningonMSR-VTT
    BLEU-4· uses extra data· 2022-05-27
    54.8
    best: 57.8 (mPLUG-2)
    SOTA
    GIT: A Generative Image-to-text Transformer for Vision and LanguagearXiv:2205.14100
  • Video CaptioningonMSR-VTT
    CIDEr· uses extra data· 2022-05-27
    75.9
    best: 80 (mPLUG-2)
    SOTA
    GIT: A Generative Image-to-text Transformer for Vision and LanguagearXiv:2205.14100
  • Video CaptioningonMSR-VTT
    GS· uses extra data· 2022-05-27
    201.6
    SOTA
    GIT: A Generative Image-to-text Transformer for Vision and LanguagearXiv:2205.14100
  • Video CaptioningonMSR-VTT
    ROUGE-L· uses extra data· 2022-05-27
    68.2
    best: 70.1 (mPLUG-2)
    SOTA
    GIT: A Generative Image-to-text Transformer for Vision and LanguagearXiv:2205.14100
  • Video CaptioningonMSR-VTT
    METEOR· uses extra data· 2022-05-27
    33.1
    best: 38.7 (MV-GPT)
    GIT: A Generative Image-to-text Transformer for Vision and LanguagearXiv:2205.14100