Tasks SotA Datasets Papers Methods Submit About

Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable Benchmarks All SotA Datasets Papers Methods

Community

Submit Results About

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/GIT, Single Model

GIT, Single Model

Reported on 32 benchmarks across 1 task · 1 paper · 17 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing32 results

Image Captioningonnocaps entire
B1· 2022-05-27
88.1
SOTA
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps entire
B2· 2022-05-27
74.81
SOTA
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps entire
B3· 2022-05-27
57.68
SOTA
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps entire
B4· 2022-05-27
37.35
best: 37.71 (CoCa - Google Brain)
SOTA
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps entire
CIDEr· 2022-05-27
123.39
best: 126.8 (Lyrics)
SOTA
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps entire
METEOR· 2022-05-27
32.5
SOTA
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps entire
ROUGE-L· 2022-05-27
63.12
SOTA
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps entire
SPICE· 2022-05-27
15.94
SOTA
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps out-of-domain
B2· 2022-05-27
71.28
SOTA
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps out-of-domain
B3· 2022-05-27
52.66
SOTA
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps out-of-domain
METEOR· 2022-05-27
30.45
best: 30.99 (PaLI)
SOTA
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps out-of-domain
ROUGE-L· 2022-05-27
60.96
best: 61.35 (PaLI)
SOTA
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps out-of-domain
SPICE· 2022-05-27
15.7
SOTA
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps in-domain
B2· 2022-05-27
76.1
SOTA
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps in-domain
B3· 2022-05-27
60.53
SOTA
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps in-domain
B4· 2022-05-27
41.65
SOTA
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps in-domain
ROUGE-L· 2022-05-27
64.02
best: 64.39 (PaLI)
SOTA
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps near-domain
B1· 2022-05-27
88.56
best: 88.9 (GIT2, Single Model)
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps near-domain
B2· 2022-05-27
75.48
best: 75.86 (GIT2, Single Model)
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps near-domain
B3· 2022-05-27
58.46
best: 58.99 (PaLI)
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps near-domain
B4· 2022-05-27
38.44
best: 39.98 (PaLI)
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps near-domain
CIDEr· 2022-05-27
123.92
best: 125.51 (GIT2, Single Model)
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps near-domain
METEOR· 2022-05-27
32.86
best: 33.47 (PaLI)
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps near-domain
ROUGE-L· 2022-05-27
63.5
best: 63.99 (PaLI)
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps near-domain
SPICE· 2022-05-27
15.96
best: 16.11 (GIT2, Single Model)
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps out-of-domain
B1· 2022-05-27
85.99
best: 86.28 (PaLI)
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps out-of-domain
B4· 2022-05-27
30.04
best: 32 (PaLI)
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps out-of-domain
CIDEr· 2022-05-27
122.04
best: 126.67 (PaLI)
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps in-domain
B1· 2022-05-27
88.55
best: 88.86 (GIT2, Single Model)
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps in-domain
CIDEr· 2022-05-27
122.4
best: 149.1 (PaLI)
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps in-domain
METEOR· 2022-05-27
33.41
best: 34.22 (PaLI)
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100
Image Captioningonnocaps in-domain
SPICE· 2022-05-27
16.18
best: 16.36 (GIT2, Single Model)
GIT: A Generative Image-to-text Transformer for Vision and Language arXiv:2205.14100