MuirBench

Introduced 2024-06-13

MuirBench is a benchmark containing 11,264 images and 2,600 multiple-choice questions, providing robust evaluation on 12 multi-image understanding tasks.

  • MuirBench evaluates on a comprehensive range of 12 multi-image understanding abilities, e.g. geographic understanding, diagram understanding, visual retrieval, ..., etc, while prior benchmarks generally contain single-image questions.

  • MuirBench contains 10 diverse multi-image relations, e.g. narrative, complementary, etc.

  • MuirBench provides a robust evaluation on models by unanswerable instance variants. Three major ways to create the unanswerable instances are as below.