TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/Xmodal-Ctx

Xmodal-Ctx

Reported on 6 benchmarks across 1 task · 1 paper · 1 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing12 results

  • Image CaptioningonCOCO Captions
    BLEU-1· 2022-05-09
    83.4
    best: 84.2 (GRIT (No VL pretraining - base))
    SOTA
    Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image CaptioningarXiv:2205.04363
  • Image CaptioningonCOCO Captions
    BLEU-4· 2022-05-09
    41.4
    best: 46.5 (mPLUG)
    Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image CaptioningarXiv:2205.04363
  • Image CaptioningonCOCO Captions
    CIDER· 2022-05-09
    139.9
    best: 155.1 (mPLUG)
    Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image CaptioningarXiv:2205.04363
  • Image CaptioningonCOCO Captions
    METEOR· 2022-05-09
    30.4
    best: 33.9 (CoCa)
    Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image CaptioningarXiv:2205.04363
  • Image CaptioningonCOCO Captions
    ROUGE-L· 2022-05-09
    60.4
    best: 61.1 (ExpansionNet v2 (No VL pretraining))
    Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image CaptioningarXiv:2205.04363
  • Image CaptioningonCOCO Captions
    SPICE· 2022-05-09
    24
    best: 27 (VAST)
    Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image CaptioningarXiv:2205.04363
  • Image CaptioningonCOCO Captions
    BLEU-1· 2022-05-09
    81.5
    best: 84.2 (GRIT (No VL pretraining - base))
    Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image CaptioningarXiv:2205.04363
  • Image CaptioningonCOCO Captions
    BLEU-4· 2022-05-09
    39.7
    best: 46.5 (mPLUG)
    Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image CaptioningarXiv:2205.04363
  • Image CaptioningonCOCO Captions
    CIDER· 2022-05-09
    135.9
    best: 155.1 (mPLUG)
    Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image CaptioningarXiv:2205.04363
  • Image CaptioningonCOCO Captions
    METEOR· 2022-05-09
    30
    best: 33.9 (CoCa)
    Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image CaptioningarXiv:2205.04363
  • Image CaptioningonCOCO Captions
    ROUGE-L· 2022-05-09
    59.5
    best: 61.1 (ExpansionNet v2 (No VL pretraining))
    Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image CaptioningarXiv:2205.04363
  • Image CaptioningonCOCO Captions
    SPICE· 2022-05-09
    23.7
    best: 27 (VAST)
    Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image CaptioningarXiv:2205.04363