TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/Oscar

Oscar

Reported on 57 benchmarks across 7 tasks · 1 paper · 25 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing44 results

  • Visual Question Answering (VQA)onVQA v2 test-dev
    Accuracy· 2020-04-13
    73.82
    best: 84.3 (PaLI)
    SOTA
    Oscar: Object-Semantics Aligned Pre-training for Vision-Language TasksarXiv:2004.06165
  • Image CaptioningonCOCO Captions
    BLEU-4· 2020-04-13
    41.7
    best: 46.5 (mPLUG)
    SOTA
    Oscar: Object-Semantics Aligned Pre-training for Vision-Language TasksarXiv:2004.06165
  • Image CaptioningonCOCO Captions
    CIDER· 2020-04-13
    140
    best: 155.1 (mPLUG)
    SOTA
    Oscar: Object-Semantics Aligned Pre-training for Vision-Language TasksarXiv:2004.06165
  • Image CaptioningonCOCO Captions
    METEOR· 2020-04-13
    30.6
    best: 33.9 (CoCa)
    SOTA
    Oscar: Object-Semantics Aligned Pre-training for Vision-Language TasksarXiv:2004.06165
  • Image CaptioningonCOCO Captions
    SPICE· 2020-04-13
    24.5
    best: 27 (VAST)
    SOTA
    Oscar: Object-Semantics Aligned Pre-training for Vision-Language TasksarXiv:2004.06165
  • Cross-Modal RetrievalonCOCO 2014
    Image-to-text R@1· uses extra data· 2020-04-13
    73.5
    best: 84.8 (BEiT-3)
    SOTA
    Oscar: Object-Semantics Aligned Pre-training for Vision-Language TasksarXiv:2004.06165
  • Cross-Modal RetrievalonCOCO 2014
    Image-to-text R@10· uses extra data· 2020-04-13
    96
    best: 98.5 (X2-VLM (large))
    SOTA
    Oscar: Object-Semantics Aligned Pre-training for Vision-Language TasksarXiv:2004.06165
  • Cross-Modal RetrievalonCOCO 2014
    Image-to-text R@5· uses extra data· 2020-04-13
    92.2
    best: 96.5 (X2-VLM (large))
    SOTA
    Oscar: Object-Semantics Aligned Pre-training for Vision-Language TasksarXiv:2004.06165
  • Cross-Modal RetrievalonCOCO 2014
    Text-to-image R@1· uses extra data· 2020-04-13
    57.5
    best: 68 (VAST)
    SOTA
    Oscar: Object-Semantics Aligned Pre-training for Vision-Language TasksarXiv:2004.06165
  • Cross-Modal RetrievalonCOCO 2014
    Text-to-image R@10· uses extra data· 2020-04-13
    89.8
    best: 92.8 (VAST)
    SOTA
    Oscar: Object-Semantics Aligned Pre-training for Vision-Language TasksarXiv:2004.06165
  • Cross-Modal RetrievalonCOCO 2014
    Text-to-image R@5· uses extra data· 2020-04-13
    82.8
    best: 92.8 (BEiT-3)
    SOTA
    Oscar: Object-Semantics Aligned Pre-training for Vision-Language TasksarXiv:2004.06165
  • Image-to-Text RetrievalonCOCO (Common Objects in Context)
    Recall@10· 2020-04-13
    99.8
    SOTA
    Oscar: Object-Semantics Aligned Pre-training for Vision-Language TasksarXiv:2004.06165
  • Image Captioningonnocaps near-domain
    B1
    80.54
    best: 88.9 (GIT2, Single Model)
  • Image Captioningonnocaps near-domain
    B2
    62.32
    best: 75.86 (GIT2, Single Model)
  • Image Captioningonnocaps near-domain
    B3
    40.65
    best: 58.99 (PaLI)
  • Image Captioningonnocaps near-domain
    B4
    22.37
    best: 39.98 (PaLI)
  • Image Captioningonnocaps near-domain
    CIDEr
    82.07
    best: 125.51 (GIT2, Single Model)
  • Image Captioningonnocaps near-domain
    METEOR
    25.91
    best: 33.47 (PaLI)
  • Image Captioningonnocaps near-domain
    ROUGE-L
    54.78
    best: 63.99 (PaLI)
  • Image Captioningonnocaps near-domain
    SPICE
    11.53
    best: 16.11 (GIT2, Single Model)
  • Image Captioningonnocaps entire
    B1
    79.57
    best: 88.1 (GIT, Single Model)
  • Image Captioningonnocaps entire
    B2
    60.83
    best: 74.81 (GIT, Single Model)
  • Image Captioningonnocaps entire
    B3
    38.83
    best: 57.68 (GIT, Single Model)
  • Image Captioningonnocaps entire
    B4
    21.02
    best: 37.71 (CoCa - Google Brain)
  • Image Captioningonnocaps entire
    CIDEr
    80.93
    best: 126.8 (Lyrics)
  • Image Captioningonnocaps entire
    METEOR
    25.33
    best: 32.5 (GIT, Single Model)
  • Image Captioningonnocaps entire
    ROUGE-L
    54.07
    best: 63.12 (GIT, Single Model)
  • Image Captioningonnocaps entire
    SPICE
    11.29
    best: 15.94 (GIT, Single Model)
  • Image Captioningonnocaps out-of-domain
    B1
    74.98
    best: 86.28 (PaLI)
  • Image Captioningonnocaps out-of-domain
    B2
    53.26
    best: 71.28 (GIT, Single Model)
  • Image Captioningonnocaps out-of-domain
    B3
    28.88
    best: 52.66 (GIT, Single Model)
  • Image Captioningonnocaps out-of-domain
    B4
    12.42
    best: 32 (PaLI)
  • Image Captioningonnocaps out-of-domain
    CIDEr
    73.75
    best: 126.67 (PaLI)
  • Image Captioningonnocaps out-of-domain
    METEOR
    21.73
    best: 30.99 (PaLI)
  • Image Captioningonnocaps out-of-domain
    ROUGE-L
    50
    best: 61.35 (PaLI)
  • Image Captioningonnocaps out-of-domain
    SPICE
    9.72
    best: 15.7 (GIT, Single Model)
  • Image Captioningonnocaps in-domain
    B1
    80.7
    best: 88.86 (GIT2, Single Model)
  • Image Captioningonnocaps in-domain
    B2
    63.27
    best: 76.1 (GIT, Single Model)
  • Image Captioningonnocaps in-domain
    B3
    42.86
    best: 60.53 (GIT, Single Model)
  • Image Captioningonnocaps in-domain
    B4
    25.78
    best: 41.65 (GIT, Single Model)
  • Image Captioningonnocaps in-domain
    CIDEr
    84.83
    best: 149.1 (PaLI)
  • Image Captioningonnocaps in-domain
    METEOR
    27.23
    best: 34.22 (PaLI)
  • Image Captioningonnocaps in-domain
    ROUGE-L
    55.91
    best: 64.39 (PaLI)
  • Image Captioningonnocaps in-domain
    SPICE
    12.06
    best: 16.36 (GIT2, Single Model)

Miscellaneous12 results

  • Image Retrieval with Multi-Modal QueryonCOCO 2014
    Image-to-text R@1· uses extra data· 2020-04-13
    73.5
    best: 84.8 (BEiT-3)
    SOTA
    Oscar: Object-Semantics Aligned Pre-training for Vision-Language TasksarXiv:2004.06165
  • Image Retrieval with Multi-Modal QueryonCOCO 2014
    Image-to-text R@10· uses extra data· 2020-04-13
    96
    best: 98.5 (X2-VLM (large))
    SOTA
    Oscar: Object-Semantics Aligned Pre-training for Vision-Language TasksarXiv:2004.06165
  • Image Retrieval with Multi-Modal QueryonCOCO 2014
    Image-to-text R@5· uses extra data· 2020-04-13
    92.2
    best: 96.5 (X2-VLM (large))
    SOTA
    Oscar: Object-Semantics Aligned Pre-training for Vision-Language TasksarXiv:2004.06165
  • Image Retrieval with Multi-Modal QueryonCOCO 2014
    Text-to-image R@1· uses extra data· 2020-04-13
    57.5
    best: 68 (VAST)
    SOTA
    Oscar: Object-Semantics Aligned Pre-training for Vision-Language TasksarXiv:2004.06165
  • Image Retrieval with Multi-Modal QueryonCOCO 2014
    Text-to-image R@10· uses extra data· 2020-04-13
    89.8
    best: 92.8 (VAST)
    SOTA
    Oscar: Object-Semantics Aligned Pre-training for Vision-Language TasksarXiv:2004.06165
  • Image Retrieval with Multi-Modal QueryonCOCO 2014
    Text-to-image R@5· uses extra data· 2020-04-13
    82.8
    best: 92.8 (BEiT-3)
    SOTA
    Oscar: Object-Semantics Aligned Pre-training for Vision-Language TasksarXiv:2004.06165
  • Cross-Modal Information RetrievalonCOCO 2014
    Image-to-text R@1· uses extra data· 2020-04-13
    73.5
    best: 84.8 (BEiT-3)
    SOTA
    Oscar: Object-Semantics Aligned Pre-training for Vision-Language TasksarXiv:2004.06165
  • Cross-Modal Information RetrievalonCOCO 2014
    Image-to-text R@10· uses extra data· 2020-04-13
    96
    best: 98.5 (X2-VLM (large))
    SOTA
    Oscar: Object-Semantics Aligned Pre-training for Vision-Language TasksarXiv:2004.06165
  • Cross-Modal Information RetrievalonCOCO 2014
    Image-to-text R@5· uses extra data· 2020-04-13
    92.2
    best: 96.5 (X2-VLM (large))
    SOTA
    Oscar: Object-Semantics Aligned Pre-training for Vision-Language TasksarXiv:2004.06165
  • Cross-Modal Information RetrievalonCOCO 2014
    Text-to-image R@1· uses extra data· 2020-04-13
    57.5
    best: 68 (VAST)
    SOTA
    Oscar: Object-Semantics Aligned Pre-training for Vision-Language TasksarXiv:2004.06165
  • Cross-Modal Information RetrievalonCOCO 2014
    Text-to-image R@10· uses extra data· 2020-04-13
    89.8
    best: 92.8 (VAST)
    SOTA
    Oscar: Object-Semantics Aligned Pre-training for Vision-Language TasksarXiv:2004.06165
  • Cross-Modal Information RetrievalonCOCO 2014
    Text-to-image R@5· uses extra data· 2020-04-13
    82.8
    best: 92.8 (BEiT-3)
    SOTA
    Oscar: Object-Semantics Aligned Pre-training for Vision-Language TasksarXiv:2004.06165

Computer Vision1 result

  • Image RetrievalonCOCO (Common Objects in Context)
    Recall@10· 2020-04-13
    98.3
    SOTA
    Oscar: Object-Semantics Aligned Pre-training for Vision-Language TasksarXiv:2004.06165