TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Natural Language Processing/Visual Question Answering (VQA)/OK-VQA

Visual Question Answering (VQA) on OK-VQA

Metric: Accuracy (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Accuracy▼Extra DataPaperDate↕Code
1PaLI-X-VPD66.8NoVisual Program Distillation: Distilling Tools an...2023-12-05-
2PaLM-E-562B66.1NoPaLM-E: An Embodied Multimodal Language Model2023-03-06Code
3PaLI-X (Single-task FT)66.1NoPaLI-X: On Scaling up a Multilingual Vision and ...2023-05-29Code
4PaLI 17B64.5NoPaLI: A Jointly-Scaled Multilingual Language-Ima...2022-09-14Code
5Prophet62.5NoProphet: Prompting Large Language Models with Co...2023-03-03Code
6RA-VQA-v2 (BLIP 2)62.08NoFine-grained Late-interaction Multi-modal Retrie...2023-09-29Code
7A Simple Baseline for KB-VQA61.2NoA Simple Baseline for Knowledge-Based Visual Que...2023-10-20-
8PromptCap60.4NoPromptCap: Prompt-Guided Task-Aware Image Captio...2022-11-15Code
9ReVeaL WIT + CC12M + Wikidata + VQA-259.1NoREVEAL: Retrieval-Augmented Visual-Language Pre-...2022-12-10Code
10Lyrics58.2NoLyrics: Boosting Fine-grained Language-Vision Al...2023-12-08-
11REVIVE (Ensemble)58NoREVIVE: Regional Visual Representation Matters i...2022-06-02Code
12REVIVE (Single)56.6NoREVIVE: Regional Visual Representation Matters i...2022-06-02Code
13RA-VQA-v2 (T5-large)54.85NoFine-grained Late-interaction Multi-modal Retrie...2023-09-29Code
14RA-VQA (T5-large)54.48NoRetrieval Augmented Visual Question Answering wi...2022-10-07Code
15VK-OOD52.4No--Code
16VK-OOD52.4No--Code
17RA-VQA-FrDPR (T5-large)51.22NoRetrieval Augmented Visual Question Answering wi...2022-10-07Code
18Flamingo80B50.6NoFlamingo: a Visual Language Model for Few-Shot L...2022-04-29Code
19TRiG (T5-Large)50.5No---
20HYDRA48.6NoHYDRA: A Hyper Agent for Dynamic Compositional V...2024-03-19Code
21PICa48YesAn Empirical Study of GPT-3 for Few-Shot Knowled...2021-09-10Code
22LaKo47.01NoLaKo: Knowledge-driven Visual Question Answering...2022-07-26Code
23BLIP-2 ViT-G FlanT5 XXL (zero-shot)45.9NoBLIP-2: Bootstrapping Language-Image Pre-trainin...2023-01-30Code
24Flamingo9B44.7NoFlamingo: a Visual Language Model for Few-Shot L...2022-04-29Code
25VLC-BERT43.1NoVLC-BERT: Visual Question Answering with Context...2022-10-24Code
26T5(Tan and Bansal, 2019) + Prefixes42.03NoLaKo: Knowledge-driven Visual Question Answering...2022-07-26Code
27Flamingo3B41.2NoFlamingo: a Visual Language Model for Few-Shot L...2022-04-29Code
28BLIP-2 ViT-G FlanT5 XL (zero-shot)40.7NoBLIP-2: Bootstrapping Language-Image Pre-trainin...2023-01-30Code
29BLIP-2 ViT-L FlanT5 XL (zero-shot)39.4NoBLIP-2: Bootstrapping Language-Image Pre-trainin...2023-01-30Code
30BLIP-2 ViT-G OPT 6.7B (zero-shot)36.4NoBLIP-2: Bootstrapping Language-Image Pre-trainin...2023-01-30Code
31PNP-VQA35.9NoPlug-and-Play VQA: Zero-shot VQA by Conjoining L...2022-10-17Code
32BLIP-2 ViT-G OPT 2.7B (zero-shot)31.7NoBLIP-2: Bootstrapping Language-Image Pre-trainin...2023-01-30Code
33BLIP-2 ViT-L OPT 2.7B (zero-shot)30.2NoBLIP-2: Bootstrapping Language-Image Pre-trainin...2023-01-30Code
34FewVLM16.5NoA Good Prompt Is Worth Millions of Parameters: L...2021-10-16Code
35MetaLM11.4NoLanguage Models are General-Purpose Interfaces2022-06-13Code
36VLKD(ViT-B/16)10.5No---
37Frozen5.9NoMultimodal Few-Shot Learning with Frozen Languag...2021-06-25-