TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Natural Language Processing/Visual Question Answering (VQA)/InfiMM-Eval

Visual Question Answering (VQA) on InfiMM-Eval

Metric: Deductive (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Deductive▼Extra DataPaperDate↕Code
1GPT-4V74.86NoGPT-4 Technical Report2023-03-15Code
2SPHINX v242.17NoSPHINX: The Joint Mixing of Weights, Tasks, and ...2023-11-13Code
3Qwen-VL-Chat37.55NoQwen-VL: A Versatile Vision-Language Model for U...2023-08-24Code
4CogVLM-Chat36.75NoCogVLM: Visual Expert for Pretrained Language Mo...2023-11-06Code
5LLaVA-1.530.94NoImproved Baselines with Visual Instruction Tuning2023-10-05Code
6Emu28.9NoEmu: Generative Pretraining in Multimodality2023-07-11Code
7LLaMA-Adapter V2 28.7NoLLaMA-Adapter V2: Parameter-Efficient Visual Ins...2023-04-28Code
8InstructBLIP27.56NoInstructBLIP: Towards General-purpose Vision-Lan...2023-05-11Code
9InternLM-XComposer-VL26.77NoInternLM-XComposer: A Vision-Language Large Mode...2023-09-26Code
10mPLUG-Owl223.43NomPLUG-Owl2: Revolutionizing Multi-modal Large La...2023-11-07Code
11Otter22.49NoOtter: A Multi-Modal Model with In-Context Instr...2023-05-05Code
12MiniGPT-v211.02NoMiniGPT-4: Enhancing Vision-Language Understandi...2023-04-20Code
13OpenFlamingo-v28.88NoOpenFlamingo: An Open-Source Framework for Train...2023-08-02Code
14BLIP-2-OPT2.7B2.76NoBLIP-2: Bootstrapping Language-Image Pre-trainin...2023-01-30Code