Tasks SotA Datasets Papers Methods Submit About

Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable Benchmarks All SotA Datasets Papers Methods

Community

Submit Results About

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/GIT2

GIT2

Reported on 37 benchmarks across 2 tasks · 1 paper · 27 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing32 results

Image Captioningonnocaps-XD in-domain
B1· 2022-05-27
88.86
SOTA
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps-XD in-domain
CIDEr· 2022-05-27
124.18
SOTA
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps-XD in-domain
METEOR· 2022-05-27
33.83
SOTA
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps-XD in-domain
SPICE· 2022-05-27
16.36
SOTA
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps-XD near-domain
B1· 2022-05-27
88.9
SOTA
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps-XD near-domain
B2· 2022-05-27
75.86
SOTA
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps-XD near-domain
B3· 2022-05-27
58.9
SOTA
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps-XD near-domain
B4· 2022-05-27
38.95
SOTA
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps-XD near-domain
CIDEr· 2022-05-27
125.51
SOTA
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps-XD near-domain
METEOR· 2022-05-27
32.95
SOTA
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps-XD near-domain
ROUGE-L· 2022-05-27
63.66
SOTA
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps-XD near-domain
SPICE· 2022-05-27
16.11
SOTA
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps-XD entire
B1· 2022-05-27
88.43
SOTA
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps-XD entire
B2· 2022-05-27
75.02
SOTA
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps-XD entire
B3· 2022-05-27
57.87
SOTA
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps-XD entire
B4· 2022-05-27
37.65
SOTA
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps-XD entire
CIDEr· 2022-05-27
124.77
SOTA
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps-XD entire
METEOR· 2022-05-27
32.56
SOTA
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps-XD entire
ROUGE-L· 2022-05-27
63.19
SOTA
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps-XD entire
SPICE· 2022-05-27
16.06
SOTA
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps-XD out-of-domain
B1· 2022-05-27
86.28
SOTA
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps-XD out-of-domain
B4· 2022-05-27
30.15
SOTA
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps-XD out-of-domain
CIDEr· 2022-05-27
122.27
SOTA
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps-XD in-domain
B2· 2022-05-27
75.86
best: 76.1 (GIT)
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps-XD in-domain
B3· 2022-05-27
59.94
best: 60.53 (GIT)
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps-XD in-domain
B4· 2022-05-27
41.1
best: 41.65 (GIT)
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps-XD in-domain
ROUGE-L· 2022-05-27
63.82
best: 64.02 (GIT)
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps-XD out-of-domain
B2· 2022-05-27
71.15
best: 71.28 (GIT)
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps-XD out-of-domain
B3· 2022-05-27
52.36
best: 52.66 (GIT)
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps-XD out-of-domain
METEOR· 2022-05-27
30.15
best: 30.45 (GIT)
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps-XD out-of-domain
ROUGE-L· 2022-05-27
60.91
best: 60.96 (GIT)
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps-XD out-of-domain
SPICE· 2022-05-27
15.62
best: 15.7 (GIT)
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100

Computer Vision5 results

Video CaptioningonMSR-VTT
BLEU-4· uses extra data· 2022-05-27
54.8
best: 57.8 (mPLUG-2)
SOTA
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Video CaptioningonMSR-VTT
CIDEr· uses extra data· 2022-05-27
75.9
best: 80 (mPLUG-2)
SOTA
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Video CaptioningonMSR-VTT
GS· uses extra data· 2022-05-27
201.6
SOTA
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Video CaptioningonMSR-VTT
ROUGE-L· uses extra data· 2022-05-27
68.2
best: 70.1 (mPLUG-2)
SOTA
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Video CaptioningonMSR-VTT
METEOR· uses extra data· 2022-05-27
33.1
best: 38.7 (MV-GPT)
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100