Papers With Code 2 | ML Benchmarks, SotA Results & Code

MULTI-Benchmark is a cutting-edge benchmark for evaluating Multimodal Large Language Models (MLLMs). It is designed to test the understanding of complex tables and images, and reasoning with long context¹. Here are some key features of MULTI-Benchmark:

Multimodal Inputs: MULTI-Benchmark provides multimodal inputs and requires responses that are either precise or open-ended, reflecting real-life examination styles¹.
Variety of Tasks: It includes over 18,000 questions and challenges MLLMs with a variety of tasks, ranging from formula derivation to image detail analysis and cross-modality reasoning¹.
MULTI-Elite and MULTI-Extend: It introduces MULTI-Elite, a 500-question selected hard subset, and MULTI-Extend, with more than 4,500 external knowledge context pieces¹.
Evaluation: The evaluation indicates significant potential for MLLM advancement, with GPT-4V achieving a 63.7% accuracy rate on MULTI, in contrast to other MLLMs scoring between 28.5% and 55.3%¹.

(1) ️ MULTI-Benchmark: Multimodal Understanding Leaderboard ... - GitHub. https://github.com/OpenDFM/MULTI-Benchmark. (2) ️ MULTI-Benchmark: Multimodal Understanding Leaderboard ... - GitHub. https://github.com/OpenDFM/MULTI-Benchmark. (3) ️ MULTI-Benchmark: Multimodal Understanding Leaderboard .... https://github.com/OpenDFM/MULTI-Benchmark/blob/main/README_zh.md. (4) MultiBench Dataset | Papers With Code. https://paperswithcode.com/dataset/multibench. (5) [2107.07502] MultiBench: Multiscale Benchmarks for Multimodal .... https://arxiv.org/abs/2107.07502. (6) MultiBench: Multiscale Benchmarks for Multimodal .... https://www.x-mol.com/paper/1416118156688826368?adv. (7) undefined. https://avatars.githubusercontent.com/u/139950066?v=4. (8) undefined. https://github.com/OpenDFM/MULTI-Benchmark/blob/main/README_zh.md?raw=true. (9) undefined. https://desktop.github.com. (10) undefined. https://docs.github.com/articles/about-issue-and-pull-request-templates. (11) undefined. https://github.com/OpenDFM/MULTI-Benchmark/raw/main/README_zh.md. (12) undefined. https://OpenDFM.github.io/MULTI-Benchmark/. (13) undefined. https://arxiv.org/abs/2402.03173/. (14) undefined. https://huggingface.co/datasets/OpenDFM/MULTI-Benchmark.

Multimodal Inputs: MULTI-Benchmark provides multimodal inputs and requires responses that are either precise or open-ended, reflecting real-life examination styles¹.
Variety of Tasks: It includes over 18,000 questions and challenges MLLMs with a variety of tasks, ranging from formula derivation to image detail analysis and cross-modality reasoning¹.
MULTI-Elite and MULTI-Extend: It introduces MULTI-Elite, a 500-question selected hard subset, and MULTI-Extend, with more than 4,500 external knowledge context pieces¹.
Evaluation: The evaluation indicates significant potential for MLLM advancement, with GPT-4V achieving a 63.7% accuracy rate on MULTI, in contrast to other MLLMs scoring between 28.5% and 55.3%¹.

MULTI

Related Benchmarks

MULTI

Related Benchmarks