Tasks SotA Datasets Papers Methods Submit About

Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable Benchmarks All SotA Datasets Papers Methods

Community

Submit Results About

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/BLIP2 FlanT5-XL (Fine-tuned)

BLIP2 FlanT5-XL (Fine-tuned)

Reported on 6 benchmarks across 4 tasks · 1 paper

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing6 results

Visual Question Answering (VQA)onWHOOPS!
BEM· uses extra data· 2023-03-13
55
best: 57 (BLIP2 FlanT5-XXL (Fine-tuned))
Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images arXiv:2303.07274
Visual Question Answering (VQA)onWHOOPS!
Exact Match· uses extra data· 2023-03-13
20
best: 21 (BLIP2 FlanT5-XXL (Fine-tuned))
Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images arXiv:2303.07274
Image CaptioningonWHOOPS!
BLEU-4· uses extra data· 2023-03-13
41
best: 42 (BLIP2 FlanT5-XXL (Fine-tuned))
Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images arXiv:2303.07274
Image CaptioningonWHOOPS!
CIDEr· uses extra data· 2023-03-13
174
best: 177 (BLIP2 FlanT5-XXL (Fine-tuned))
Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images arXiv:2303.07274
Explanation GenerationonWHOOPS!
Human (%)· uses extra data· 2023-03-13
15
best: 68 (Ground-truth Caption -> GPT3 (Oracle))
Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images arXiv:2303.07274
Image-to-Text RetrievalonWHOOPS!
Specificity· uses extra data· 2023-03-13
81
best: 94 (BLIP2 FlanT5-XXL (Text-only FT))
Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images arXiv:2303.07274