TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/Microsoft Cognitive Services team

Microsoft Cognitive Services team

Reported on 64 benchmarks across 1 task · 2 papers · 72 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing72 results

  • Image Captioningonnocaps entire
    B1· 2021-11-24
    85.62
    best: 88.1 (GIT, Single Model)
    SOTA
    Scaling Up Vision-Language Pre-training for Image CaptioningarXiv:2111.12233
  • Image Captioningonnocaps entire
    B2· 2021-11-24
    71.36
    best: 74.81 (GIT, Single Model)
    SOTA
    Scaling Up Vision-Language Pre-training for Image CaptioningarXiv:2111.12233
  • Image Captioningonnocaps entire
    B3· 2021-11-24
    53.62
    best: 57.68 (GIT, Single Model)
    SOTA
    Scaling Up Vision-Language Pre-training for Image CaptioningarXiv:2111.12233
  • Image Captioningonnocaps entire
    B4· 2021-11-24
    34.65
    best: 37.71 (CoCa - Google Brain)
    SOTA
    Scaling Up Vision-Language Pre-training for Image CaptioningarXiv:2111.12233
  • Image Captioningonnocaps entire
    CIDEr· 2021-11-24
    114.25
    best: 126.8 (Lyrics)
    SOTA
    Scaling Up Vision-Language Pre-training for Image CaptioningarXiv:2111.12233
  • Image Captioningonnocaps entire
    METEOR· 2021-11-24
    31.27
    best: 32.5 (GIT, Single Model)
    SOTA
    Scaling Up Vision-Language Pre-training for Image CaptioningarXiv:2111.12233
  • Image Captioningonnocaps entire
    ROUGE-L· 2021-11-24
    61.2
    best: 63.12 (GIT, Single Model)
    SOTA
    Scaling Up Vision-Language Pre-training for Image CaptioningarXiv:2111.12233
  • Image Captioningonnocaps entire
    SPICE· 2021-11-24
    14.85
    best: 15.94 (GIT, Single Model)
    SOTA
    Scaling Up Vision-Language Pre-training for Image CaptioningarXiv:2111.12233
  • Image Captioningonnocaps-XD entire
    B1· uses extra data· 2021-11-24
    85.62
    best: 88.43 (GIT2)
    SOTA
    Scaling Up Vision-Language Pre-training for Image CaptioningarXiv:2111.12233
  • Image Captioningonnocaps-XD entire
    B2· uses extra data· 2021-11-24
    71.36
    best: 75.02 (GIT2)
    SOTA
    Scaling Up Vision-Language Pre-training for Image CaptioningarXiv:2111.12233
  • Image Captioningonnocaps-XD entire
    B3· uses extra data· 2021-11-24
    53.62
    best: 57.87 (GIT2)
    SOTA
    Scaling Up Vision-Language Pre-training for Image CaptioningarXiv:2111.12233
  • Image Captioningonnocaps-XD entire
    B4· uses extra data· 2021-11-24
    34.65
    best: 37.65 (GIT2)
    SOTA
    Scaling Up Vision-Language Pre-training for Image CaptioningarXiv:2111.12233
  • Image Captioningonnocaps-XD entire
    CIDEr· uses extra data· 2021-11-24
    114.25
    best: 124.77 (GIT2)
    SOTA
    Scaling Up Vision-Language Pre-training for Image CaptioningarXiv:2111.12233
  • Image Captioningonnocaps-XD entire
    METEOR· uses extra data· 2021-11-24
    31.27
    best: 32.56 (GIT2)
    SOTA
    Scaling Up Vision-Language Pre-training for Image CaptioningarXiv:2111.12233
  • Image Captioningonnocaps-XD entire
    ROUGE-L· uses extra data· 2021-11-24
    61.2
    best: 63.19 (GIT2)
    SOTA
    Scaling Up Vision-Language Pre-training for Image CaptioningarXiv:2111.12233
  • Image Captioningonnocaps-XD entire
    SPICE· uses extra data· 2021-11-24
    14.85
    best: 16.06 (GIT2)
    SOTA
    Scaling Up Vision-Language Pre-training for Image CaptioningarXiv:2111.12233
  • Image Captioningonnocaps near-domain
    B1· 2020-09-28
    86.48
    best: 88.9 (GIT2, Single Model)
    SOTA
    VIVO: Visual Vocabulary Pre-Training for Novel Object CaptioningarXiv:2009.13682
  • Image Captioningonnocaps near-domain
    B2· 2020-09-28
    72.6
    best: 75.86 (GIT2, Single Model)
    SOTA
    VIVO: Visual Vocabulary Pre-Training for Novel Object CaptioningarXiv:2009.13682
  • Image Captioningonnocaps near-domain
    B3· 2020-09-28
    55.26
    best: 58.99 (PaLI)
    SOTA
    VIVO: Visual Vocabulary Pre-Training for Novel Object CaptioningarXiv:2009.13682
  • Image Captioningonnocaps near-domain
    B4· 2020-09-28
    36.31
    best: 39.98 (PaLI)
    SOTA
    VIVO: Visual Vocabulary Pre-Training for Novel Object CaptioningarXiv:2009.13682
  • Image Captioningonnocaps near-domain
    CIDEr· 2020-09-28
    115.54
    best: 125.51 (GIT2, Single Model)
    SOTA
    VIVO: Visual Vocabulary Pre-Training for Novel Object CaptioningarXiv:2009.13682
  • Image Captioningonnocaps near-domain
    METEOR· 2020-09-28
    31.8
    best: 33.47 (PaLI)
    SOTA
    VIVO: Visual Vocabulary Pre-Training for Novel Object CaptioningarXiv:2009.13682
  • Image Captioningonnocaps near-domain
    ROUGE-L· 2020-09-28
    61.9
    best: 63.99 (PaLI)
    SOTA
    VIVO: Visual Vocabulary Pre-Training for Novel Object CaptioningarXiv:2009.13682
  • Image Captioningonnocaps near-domain
    SPICE· 2020-09-28
    15.06
    best: 16.11 (GIT2, Single Model)
    SOTA
    VIVO: Visual Vocabulary Pre-Training for Novel Object CaptioningarXiv:2009.13682
  • Image Captioningonnocaps out-of-domain
    B1· 2020-09-28
    81.73
    best: 86.28 (PaLI)
    SOTA
    VIVO: Visual Vocabulary Pre-Training for Novel Object CaptioningarXiv:2009.13682
  • Image Captioningonnocaps out-of-domain
    B2· 2020-09-28
    65.48
    best: 71.28 (GIT, Single Model)
    SOTA
    VIVO: Visual Vocabulary Pre-Training for Novel Object CaptioningarXiv:2009.13682
  • Image Captioningonnocaps out-of-domain
    B3· 2020-09-28
    45.58
    best: 52.66 (GIT, Single Model)
    SOTA
    VIVO: Visual Vocabulary Pre-Training for Novel Object CaptioningarXiv:2009.13682
  • Image Captioningonnocaps out-of-domain
    B4· 2020-09-28
    25.78
    best: 32 (PaLI)
    SOTA
    VIVO: Visual Vocabulary Pre-Training for Novel Object CaptioningarXiv:2009.13682
  • Image Captioningonnocaps out-of-domain
    CIDEr· 2020-09-28
    110.14
    best: 126.67 (PaLI)
    SOTA
    VIVO: Visual Vocabulary Pre-Training for Novel Object CaptioningarXiv:2009.13682
  • Image Captioningonnocaps out-of-domain
    METEOR· 2020-09-28
    28.17
    best: 30.99 (PaLI)
    SOTA
    VIVO: Visual Vocabulary Pre-Training for Novel Object CaptioningarXiv:2009.13682
  • Image Captioningonnocaps out-of-domain
    ROUGE-L· 2020-09-28
    57.57
    best: 61.35 (PaLI)
    SOTA
    VIVO: Visual Vocabulary Pre-Training for Novel Object CaptioningarXiv:2009.13682
  • Image Captioningonnocaps out-of-domain
    SPICE· 2020-09-28
    13.74
    best: 15.7 (GIT, Single Model)
    SOTA
    VIVO: Visual Vocabulary Pre-Training for Novel Object CaptioningarXiv:2009.13682
  • Image Captioningonnocaps-XD in-domain
    B1· 2020-09-28
    82.94
    best: 88.86 (GIT2)
    SOTA
    VIVO: Visual Vocabulary Pre-Training for Novel Object CaptioningarXiv:2009.13682
  • Image Captioningonnocaps-XD in-domain
    B2· 2020-09-28
    67.56
    best: 76.1 (GIT)
    SOTA
    VIVO: Visual Vocabulary Pre-Training for Novel Object CaptioningarXiv:2009.13682
  • Image Captioningonnocaps-XD in-domain
    B3· 2020-09-28
    49.66
    best: 60.53 (GIT)
    SOTA
    VIVO: Visual Vocabulary Pre-Training for Novel Object CaptioningarXiv:2009.13682
  • Image Captioningonnocaps-XD in-domain
    B4· 2020-09-28
    32.07
    best: 41.65 (GIT)
    SOTA
    VIVO: Visual Vocabulary Pre-Training for Novel Object CaptioningarXiv:2009.13682
  • Image Captioningonnocaps-XD in-domain
    CIDEr· 2020-09-28
    100.62
    best: 124.18 (GIT2)
    SOTA
    VIVO: Visual Vocabulary Pre-Training for Novel Object CaptioningarXiv:2009.13682
  • Image Captioningonnocaps-XD in-domain
    METEOR· 2020-09-28
    30.62
    best: 33.83 (GIT2)
    SOTA
    VIVO: Visual Vocabulary Pre-Training for Novel Object CaptioningarXiv:2009.13682
  • Image Captioningonnocaps-XD in-domain
    ROUGE-L· 2020-09-28
    59.43
    best: 64.02 (GIT)
    SOTA
    VIVO: Visual Vocabulary Pre-Training for Novel Object CaptioningarXiv:2009.13682
  • Image Captioningonnocaps-XD in-domain
    SPICE· 2020-09-28
    14.7
    best: 16.36 (GIT2)
    SOTA
    VIVO: Visual Vocabulary Pre-Training for Novel Object CaptioningarXiv:2009.13682
  • Image Captioningonnocaps in-domain
    B1· 2020-09-28
    86.33
    best: 88.86 (GIT2, Single Model)
    SOTA
    VIVO: Visual Vocabulary Pre-Training for Novel Object CaptioningarXiv:2009.13682
  • Image Captioningonnocaps in-domain
    B2· 2020-09-28
    72.83
    best: 76.1 (GIT, Single Model)
    SOTA
    VIVO: Visual Vocabulary Pre-Training for Novel Object CaptioningarXiv:2009.13682
  • Image Captioningonnocaps in-domain
    B3· 2020-09-28
    55.94
    best: 60.53 (GIT, Single Model)
    SOTA
    VIVO: Visual Vocabulary Pre-Training for Novel Object CaptioningarXiv:2009.13682
  • Image Captioningonnocaps in-domain
    B4· 2020-09-28
    37.97
    best: 41.65 (GIT, Single Model)
    SOTA
    VIVO: Visual Vocabulary Pre-Training for Novel Object CaptioningarXiv:2009.13682
  • Image Captioningonnocaps in-domain
    CIDEr· 2020-09-28
    112.82
    best: 149.1 (PaLI)
    SOTA
    VIVO: Visual Vocabulary Pre-Training for Novel Object CaptioningarXiv:2009.13682
  • Image Captioningonnocaps in-domain
    METEOR· 2020-09-28
    32.7
    best: 34.22 (PaLI)
    SOTA
    VIVO: Visual Vocabulary Pre-Training for Novel Object CaptioningarXiv:2009.13682
  • Image Captioningonnocaps in-domain
    ROUGE-L· 2020-09-28
    62.48
    best: 64.39 (PaLI)
    SOTA
    VIVO: Visual Vocabulary Pre-Training for Novel Object CaptioningarXiv:2009.13682
  • Image Captioningonnocaps in-domain
    SPICE· 2020-09-28
    15.22
    best: 16.36 (GIT2, Single Model)
    SOTA
    VIVO: Visual Vocabulary Pre-Training for Novel Object CaptioningarXiv:2009.13682
  • Image Captioningonnocaps-XD near-domain
    B1· 2020-09-28
    82.88
    best: 88.9 (GIT2)
    SOTA
    VIVO: Visual Vocabulary Pre-Training for Novel Object CaptioningarXiv:2009.13682
  • Image Captioningonnocaps-XD near-domain
    B2· 2020-09-28
    67.01
    best: 75.86 (GIT2)
    SOTA
    VIVO: Visual Vocabulary Pre-Training for Novel Object CaptioningarXiv:2009.13682
  • Image Captioningonnocaps-XD near-domain
    B3· 2020-09-28
    48.73
    best: 58.9 (GIT2)
    SOTA
    VIVO: Visual Vocabulary Pre-Training for Novel Object CaptioningarXiv:2009.13682
  • Image Captioningonnocaps-XD near-domain
    B4· 2020-09-28
    30.21
    best: 38.95 (GIT2)
    SOTA
    VIVO: Visual Vocabulary Pre-Training for Novel Object CaptioningarXiv:2009.13682
  • Image Captioningonnocaps-XD near-domain
    CIDEr· 2020-09-28
    101.2
    best: 125.51 (GIT2)
    SOTA
    VIVO: Visual Vocabulary Pre-Training for Novel Object CaptioningarXiv:2009.13682
  • Image Captioningonnocaps-XD near-domain
    METEOR· 2020-09-28
    30
    best: 32.95 (GIT2)
    SOTA
    VIVO: Visual Vocabulary Pre-Training for Novel Object CaptioningarXiv:2009.13682
  • Image Captioningonnocaps-XD near-domain
    ROUGE-L· 2020-09-28
    58.76
    best: 63.66 (GIT2)
    SOTA
    VIVO: Visual Vocabulary Pre-Training for Novel Object CaptioningarXiv:2009.13682
  • Image Captioningonnocaps-XD near-domain
    SPICE· 2020-09-28
    14.27
    best: 16.11 (GIT2)
    SOTA
    VIVO: Visual Vocabulary Pre-Training for Novel Object CaptioningarXiv:2009.13682
  • Image Captioningonnocaps-XD entire
    B1· 2020-09-28
    82.27
    best: 88.43 (GIT2)
    SOTA
    VIVO: Visual Vocabulary Pre-Training for Novel Object CaptioningarXiv:2009.13682
  • Image Captioningonnocaps-XD entire
    B2· 2020-09-28
    66.04
    best: 75.02 (GIT2)
    SOTA
    VIVO: Visual Vocabulary Pre-Training for Novel Object CaptioningarXiv:2009.13682
  • Image Captioningonnocaps-XD entire
    B3· 2020-09-28
    47.48
    best: 57.87 (GIT2)
    SOTA
    VIVO: Visual Vocabulary Pre-Training for Novel Object CaptioningarXiv:2009.13682
  • Image Captioningonnocaps-XD entire
    B4· 2020-09-28
    28.95
    best: 37.65 (GIT2)
    SOTA
    VIVO: Visual Vocabulary Pre-Training for Novel Object CaptioningarXiv:2009.13682
  • Image Captioningonnocaps-XD entire
    CIDEr· 2020-09-28
    100.12
    best: 124.77 (GIT2)
    SOTA
    VIVO: Visual Vocabulary Pre-Training for Novel Object CaptioningarXiv:2009.13682
  • Image Captioningonnocaps-XD entire
    METEOR· 2020-09-28
    29.47
    best: 32.56 (GIT2)
    SOTA
    VIVO: Visual Vocabulary Pre-Training for Novel Object CaptioningarXiv:2009.13682
  • Image Captioningonnocaps-XD entire
    ROUGE-L· 2020-09-28
    58.26
    best: 63.19 (GIT2)
    SOTA
    VIVO: Visual Vocabulary Pre-Training for Novel Object CaptioningarXiv:2009.13682
  • Image Captioningonnocaps-XD entire
    SPICE· 2020-09-28
    14.04
    best: 16.06 (GIT2)
    SOTA
    VIVO: Visual Vocabulary Pre-Training for Novel Object CaptioningarXiv:2009.13682
  • Image Captioningonnocaps-XD out-of-domain
    B1· 2020-09-28
    79.44
    best: 86.28 (GIT2)
    SOTA
    VIVO: Visual Vocabulary Pre-Training for Novel Object CaptioningarXiv:2009.13682
  • Image Captioningonnocaps-XD out-of-domain
    B2· 2020-09-28
    61.15
    best: 71.28 (GIT)
    SOTA
    VIVO: Visual Vocabulary Pre-Training for Novel Object CaptioningarXiv:2009.13682
  • Image Captioningonnocaps-XD out-of-domain
    B3· 2020-09-28
    41.03
    best: 52.66 (GIT)
    SOTA
    VIVO: Visual Vocabulary Pre-Training for Novel Object CaptioningarXiv:2009.13682
  • Image Captioningonnocaps-XD out-of-domain
    B4· 2020-09-28
    21.79
    best: 30.15 (GIT2)
    SOTA
    VIVO: Visual Vocabulary Pre-Training for Novel Object CaptioningarXiv:2009.13682
  • Image Captioningonnocaps-XD out-of-domain
    CIDEr· 2020-09-28
    95.5
    best: 122.27 (GIT2)
    SOTA
    VIVO: Visual Vocabulary Pre-Training for Novel Object CaptioningarXiv:2009.13682
  • Image Captioningonnocaps-XD out-of-domain
    METEOR· 2020-09-28
    26.56
    best: 30.45 (GIT)
    SOTA
    VIVO: Visual Vocabulary Pre-Training for Novel Object CaptioningarXiv:2009.13682
  • Image Captioningonnocaps-XD out-of-domain
    ROUGE-L· 2020-09-28
    55.49
    best: 60.96 (GIT)
    SOTA
    VIVO: Visual Vocabulary Pre-Training for Novel Object CaptioningarXiv:2009.13682
  • Image Captioningonnocaps-XD out-of-domain
    SPICE· 2020-09-28
    12.66
    best: 15.7 (GIT)
    SOTA
    VIVO: Visual Vocabulary Pre-Training for Novel Object CaptioningarXiv:2009.13682