TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Natural Language Processing/Visual Question Answering (VQA)/InfiMM-Eval

Visual Question Answering (VQA) on InfiMM-Eval

Metric: Abductive (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Abductive▼Extra DataPaperDate↕Code
1GPT-4V77.88NoGPT-4 Technical Report2023-03-15Code
2SPHINX v249.85NoSPHINX: The Joint Mixing of Weights, Tasks, and ...2023-11-13Code
3LLaVA-1.547.91NoImproved Baselines with Visual Instruction Tuning2023-10-05Code
4CogVLM-Chat47.88NoCogVLM: Visual Expert for Pretrained Language Mo...2023-11-06Code
5LLaMA-Adapter V2 46.12NoLLaMA-Adapter V2: Parameter-Efficient Visual Ins...2023-04-28Code
6Qwen-VL-Chat44.39NoQwen-VL: A Versatile Vision-Language Model for U...2023-08-24Code
7InstructBLIP37.76NoInstructBLIP: Towards General-purpose Vision-Lan...2023-05-11Code
8Emu36.57NoEmu: Generative Pretraining in Multimodality2023-07-11Code
9InternLM-XComposer-VL35.97NoInternLM-XComposer: A Vision-Language Large Mode...2023-09-26Code
10Otter33.64NoOtter: A Multi-Modal Model with In-Context Instr...2023-05-05Code
11mPLUG-Owl220.6NomPLUG-Owl2: Revolutionizing Multi-modal Large La...2023-11-07Code
12BLIP-2-OPT2.7B18.96NoBLIP-2: Bootstrapping Language-Image Pre-trainin...2023-01-30Code
13MiniGPT-v213.28NoMiniGPT-4: Enhancing Vision-Language Understandi...2023-04-20Code
14OpenFlamingo-v25.3NoOpenFlamingo: An Open-Source Framework for Train...2023-08-02Code