TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets/MULTI

MULTI

ImagesTextsIntroduced 2024-02-05

MULTI-Benchmark is a cutting-edge benchmark for evaluating Multimodal Large Language Models (MLLMs). It is designed to test the understanding of complex tables and images, and reasoning with long context¹. Here are some key features of MULTI-Benchmark:

  • Multimodal Inputs: MULTI-Benchmark provides multimodal inputs and requires responses that are either precise or open-ended, reflecting real-life examination styles¹.
  • Variety of Tasks: It includes over 18,000 questions and challenges MLLMs with a variety of tasks, ranging from formula derivation to image detail analysis and cross-modality reasoning¹.
  • MULTI-Elite and MULTI-Extend: It introduces MULTI-Elite, a 500-question selected hard subset, and MULTI-Extend, with more than 4,500 external knowledge context pieces¹.
  • Evaluation: The evaluation indicates significant potential for MLLM advancement, with GPT-4V achieving a 63.7% accuracy rate on MULTI, in contrast to other MLLMs scoring between 28.5% and 55.3%¹.

(1) ️ MULTI-Benchmark: Multimodal Understanding Leaderboard ... - GitHub. https://github.com/OpenDFM/MULTI-Benchmark. (2) ️ MULTI-Benchmark: Multimodal Understanding Leaderboard ... - GitHub. https://github.com/OpenDFM/MULTI-Benchmark. (3) ️ MULTI-Benchmark: Multimodal Understanding Leaderboard .... https://github.com/OpenDFM/MULTI-Benchmark/blob/main/README_zh.md. (4) MultiBench Dataset | Papers With Code. https://paperswithcode.com/dataset/multibench. (5) [2107.07502] MultiBench: Multiscale Benchmarks for Multimodal .... https://arxiv.org/abs/2107.07502. (6) MultiBench: Multiscale Benchmarks for Multimodal .... https://www.x-mol.com/paper/1416118156688826368?adv. (7) undefined. https://avatars.githubusercontent.com/u/139950066?v=4. (8) undefined. https://github.com/OpenDFM/MULTI-Benchmark/blob/main/README_zh.md?raw=true. (9) undefined. https://desktop.github.com. (10) undefined. https://docs.github.com/articles/about-issue-and-pull-request-templates. (11) undefined. https://github.com/OpenDFM/MULTI-Benchmark/raw/main/README_zh.md. (12) undefined. https://OpenDFM.github.io/MULTI-Benchmark/. (13) undefined. https://arxiv.org/abs/2402.03173/. (14) undefined. https://huggingface.co/datasets/OpenDFM/MULTI-Benchmark.

Related Benchmarks

MULTIWOZ 2.0/Dialogue/BLEUMULTIWOZ 2.0/Dialogue/BLEU-4MULTIWOZ 2.0/Dialogue/MultiWOZ (Inform)MULTIWOZ 2.0/Dialogue/MultiWOZ (Success)MULTIWOZ 2.0/Dialogue/ScoreMULTIWOZ 2.0/Task-Oriented Dialogue Systems/BLEU-4MULTIWOZ 2.0/Task-Oriented Dialogue Systems/ScoreMULTIWOZ 2.1/Data-to-Text Generation/BLEUMULTIWOZ 2.1/Dialogue/BLEUMULTIWOZ 2.1/Dialogue/Joint AccMULTIWOZ 2.1/Dialogue/MultiWOZ (Inform)MULTIWOZ 2.1/Dialogue/MultiWOZ (Joint Goal Acc)MULTIWOZ 2.1/Dialogue/MultiWOZ (Success)MULTIWOZ 2.1/Text Generation/BLEUMULTIWOZ 2.2/Dialogue/MultiWOZ (Joint Goal Acc)MULTIWOZ 2.2/Intent Detection/AccuarcyMULTIWOZ 2.2/Slot Filling/F1 scoreMulti Lingual Bug Reports/Machine Translation/BERTScoreMulti-Domain Sentiment Dataset/Sentiment Analysis/AverageMulti-Domain Sentiment Dataset/Sentiment Analysis/BooksMulti-Domain Sentiment Dataset/Sentiment Analysis/DVDMulti-Domain Sentiment Dataset/Sentiment Analysis/ElectronicsMulti-Domain Sentiment Dataset/Sentiment Analysis/KitchenMulti-Labelled SMILES Odors dataset/Atomistic Description/Avg. ROC-AUCMulti-Labelled SMILES Odors dataset/Molecular Property Prediction/Avg. ROC-AUCMulti-Level Event Extraction (MLEE)/Event Extraction/F1Multi-Level Event Extraction (MLEE)/Information Extraction/F1Multi-Level Event Extraction (MLEE)/Open Information Extraction/F1Multi-Modal CelebA-HQ/3D/FIDMulti-Modal CelebA-HQ/3D Face Modelling/FIDMulti-Modal CelebA-HQ/3D Face Reconstruction/FIDMulti-Modal CelebA-HQ/Face Reconstruction/FIDMulti-Modal CelebA-HQ/Facial Recognition and Modelling/FIDMulti-Modal CelebA-HQ/Multimodal Association/FIDMulti-Modal CelebA-HQ/Sketch/FIDMulti-Modal-CelebA-HQ/1 Image, 2*2 Stitchi/AccMulti-Modal-CelebA-HQ/1 Image, 2*2 Stitchi/FIDMulti-Modal-CelebA-HQ/1 Image, 2*2 Stitchi/LPIPSMulti-Modal-CelebA-HQ/1 Image, 2*2 Stitchi/RealMulti-Modal-CelebA-HQ/10-shot image generation/AccMulti-Modal-CelebA-HQ/10-shot image generation/FIDMulti-Modal-CelebA-HQ/10-shot image generation/LPIPSMulti-Modal-CelebA-HQ/10-shot image generation/RealMulti-Modal-CelebA-HQ/Image Generation/AccMulti-Modal-CelebA-HQ/Image Generation/FIDMulti-Modal-CelebA-HQ/Image Generation/LPIPSMulti-Modal-CelebA-HQ/Image Generation/RealMulti-Modal-CelebA-HQ/Text-to-Image Generation/AccMulti-Modal-CelebA-HQ/Text-to-Image Generation/FIDMulti-Modal-CelebA-HQ/Text-to-Image Generation/LPIPSMulti-Modal-CelebA-HQ/Text-to-Image Generation/RealMulti-News/Information Extraction/NMIMulti-News/Text Generation/ROUGE-1Multi-News/Text Generation/ROUGE-2Multi-News/Text Generation/ROUGE-LMulti-News/Text Generation/ROUGE-SU4Multi-News/Text Summarization/ROUGE-1Multi-News/Text Summarization/ROUGE-2Multi-News/Text Summarization/ROUGE-LMulti-News/Text Summarization/ROUGE-SU4Multi-PIE/Single-Image Portrait Relighting/Si-L2Multi-PIE/Single-Image Portrait Relighting/Si-MSEMulti-Person PoseTrack/1 Image, 2*2 Stitchi/Mean mAPMulti-Person PoseTrack/3D/Mean mAPMulti-Person PoseTrack/Multi-Person Pose Estimation/Mean mAPMulti-Person PoseTrack/Pose Estimation/Mean mAPMulti-Person PoseTrack/Pose Tracking/MOTAMulti-Person PoseTrack/Pose Tracking/MOTPMulti-Rewrite/Dialogue Rewriting/BLEU-1Multi-Rewrite/Dialogue Rewriting/BLEU-2Multi-Rewrite/Dialogue Rewriting/ROUGE-1Multi-Rewrite/Dialogue Rewriting/ROUGE-2Multi-Rewrite/Dialogue Rewriting/Rewriting F1Multi-Rewrite/Dialogue Rewriting/Rewriting F2Multi-Rewrite/Dialogue Rewriting/Rewriting F3Multi-THUMOS/Action Detection/mAPMulti-behavior Taobao/Recommendation Systems/HR@10Multi-dSprites/Image Generation/FIDMulti-day Continuous BP Prediction/Blood pressure estimation/RMSEMulti-day Continuous BP Prediction/ECG Classification/RMSEMulti-day Continuous BP Prediction/Electrocardiography (ECG)/RMSEMulti-day Continuous BP Prediction/Medical waveform analysis/RMSEMulti-day Continuous BP Prediction/Photoplethysmography (PPG)/RMSEMulti-omics mRNA, miRNA, and DNA Methylation Dataset/Cancer Classification/1:1 AccuracyMulti30K/Machine Translation/BLEU (EN-DE)Multi30K/Machine Translation/BLUE (DE-EN)Multi30K/Machine Translation/Meteor (EN-DE)Multi30K/Machine Translation/Meteor (EN-FR)Multi30K/Multimodal Machine Translation/BLEU (EN-DE)Multi30K/Multimodal Machine Translation/BLUE (DE-EN)Multi30K/Multimodal Machine Translation/Meteor (EN-DE)Multi30K/Multimodal Machine Translation/Meteor (EN-FR)MultiArith/Arithmetic Reasoning/AccuracyMultiMNIST/Image Classification/Percentage errorMultiNERD/Image Enhancement/F1 scoreMultiNLI/Natural Language Inference/AccuracyMultiNLI/Natural Language Inference/Dev MatchedMultiNLI/Natural Language Inference/Dev MismatchedMultiNLI/Natural Language Inference/MatchedMultiNLI/Natural Language Inference/MismatchedMultiNLI Dev/Natural Language Inference/MatchedMultiNLI Dev/Natural Language Inference/MismatchedMultiNews test/Language Modelling/PerplexityMultiNews val/Language Modelling/PerplexityMultiOFF/Meme Classification/AccuracyMultiOFF/Meme Classification/F1MultiQ/Question Answering/AccuracyMultiRC/Question Answering/EMMultiRC/Question Answering/F1MultiScan/16k/mAP@0.25MultiScan/16k/mAP@0.5MultiScan/2D Classification/mAP@0.25MultiScan/2D Classification/mAP@0.5MultiScan/2D Object Detection/mAP@0.25MultiScan/2D Object Detection/mAP@0.5MultiScan/3D/mAP@0.25MultiScan/3D/mAP@0.5MultiScan/3D Object Detection/mAP@0.25MultiScan/3D Object Detection/mAP@0.5MultiScan/Object Detection/mAP@0.25MultiScan/Object Detection/mAP@0.5MultiSpanQA/Question Answering/Exact F1MultiSports/Action Detection/Frame-mAP 0.5MultiSports/Action Detection/Video-mAP 0.2MultiSports/Action Detection/Video-mAP 0.5MultiSports/Open Vocabulary Action Detection/val mAPMultiSubs/Multimodal Text Prediction/AccuracyMultiSubs/Multimodal Text Prediction/Word similarityMultiSubs English-French/Machine Translation/ALIMultiSubs English-French/Multimodal Machine Translation/ALIMultiSubs English-German/Machine Translation/ALIMultiSubs English-German/Multimodal Machine Translation/ALIMultiSubs English-Portuguese/Machine Translation/ALIMultiSubs English-Portuguese/Multimodal Machine Translation/ALIMultiSubs English-Spanish/Machine Translation/ALIMultiSubs English-Spanish/Multimodal Machine Translation/ALIMultiTHUMOS/Action Detection/mAPMultiTHUMOS/Action Localization/Average mAPMultiTHUMOS/Action Localization/mAP IOU@0.1MultiTHUMOS/Action Localization/mAP IOU@0.2MultiTHUMOS/Action Localization/mAP IOU@0.3MultiTHUMOS/Action Localization/mAP IOU@0.4MultiTHUMOS/Action Localization/mAP IOU@0.5MultiTHUMOS/Action Localization/mAP IOU@0.6MultiTHUMOS/Action Localization/mAP IOU@0.7MultiTHUMOS/Action Localization/mAP IOU@0.8MultiTHUMOS/Action Localization/mAP IOU@0.9MultiTHUMOS/Temporal Action Localization/Average mAPMultiTHUMOS/Temporal Action Localization/mAP IOU@0.1MultiTHUMOS/Temporal Action Localization/mAP IOU@0.2MultiTHUMOS/Temporal Action Localization/mAP IOU@0.3MultiTHUMOS/Temporal Action Localization/mAP IOU@0.4MultiTHUMOS/Temporal Action Localization/mAP IOU@0.5MultiTHUMOS/Temporal Action Localization/mAP IOU@0.6MultiTHUMOS/Temporal Action Localization/mAP IOU@0.7MultiTHUMOS/Temporal Action Localization/mAP IOU@0.8MultiTHUMOS/Temporal Action Localization/mAP IOU@0.9MultiTHUMOS/Video/Average mAPMultiTHUMOS/Video/mAP IOU@0.1MultiTHUMOS/Video/mAP IOU@0.2MultiTHUMOS/Video/mAP IOU@0.3MultiTHUMOS/Video/mAP IOU@0.4MultiTHUMOS/Video/mAP IOU@0.5MultiTHUMOS/Video/mAP IOU@0.6MultiTHUMOS/Video/mAP IOU@0.7MultiTHUMOS/Video/mAP IOU@0.8MultiTHUMOS/Video/mAP IOU@0.9MultiTHUMOS/Zero-Shot Learning/Average mAPMultiTHUMOS/Zero-Shot Learning/mAP IOU@0.1MultiTHUMOS/Zero-Shot Learning/mAP IOU@0.2MultiTHUMOS/Zero-Shot Learning/mAP IOU@0.3MultiTHUMOS/Zero-Shot Learning/mAP IOU@0.4MultiTHUMOS/Zero-Shot Learning/mAP IOU@0.5MultiTHUMOS/Zero-Shot Learning/mAP IOU@0.6MultiTHUMOS/Zero-Shot Learning/mAP IOU@0.7MultiTHUMOS/Zero-Shot Learning/mAP IOU@0.8MultiTHUMOS/Zero-Shot Learning/mAP IOU@0.9MultiTQ/Question Answering/Hits@1MultiTQ/Question Answering/Hits@10Multilingual Dataset for Training and Evaluating Diacritics Restoration Systems/Croatian Text Diacritization/Alpha-Word accuracyMultilingual Dataset for Training and Evaluating Diacritics Restoration Systems/Czech Text Diacritization/Alpha-Word accuracyMultilingual Dataset for Training and Evaluating Diacritics Restoration Systems/French Text Diacritization/Alpha-Word accuracyMultilingual Dataset for Training and Evaluating Diacritics Restoration Systems/Hungarian Text Diacritization/Alpha-Word accuracyMultilingual Dataset for Training and Evaluating Diacritics Restoration Systems/Irish Text Diacritization/Alpha-Word accuracyMultilingual Dataset for Training and Evaluating Diacritics Restoration Systems/Latvian Text Diacritization/Alpha-Word accuracyMultilingual Dataset for Training and Evaluating Diacritics Restoration Systems/Polish Text Diacritization/Alpha-Word accuracyMultilingual Dataset for Training and Evaluating Diacritics Restoration Systems/Romanian Text Diacritization/Alpha-Word accuracyMultilingual Dataset for Training and Evaluating Diacritics Restoration Systems/Slovak Text Diacritization/Alpha-Word accuracyMultilingual Dataset for Training and Evaluating Diacritics Restoration Systems/Spanish Text Diacritization/Alpha-Word accuracyMultilingual Dataset for Training and Evaluating Diacritics Restoration Systems/Turkish Text Diacritization/Alpha-Word accuracyMultilingual Dataset for Training and Evaluating Diacritics Restoration Systems/Vietnamese Text Diacritization/Alpha-Word accuracyMultimodal PISA/Audio Classification/Accuracy (%)Multimodal PISA/Classification/Accuracy (%)Multimodal PISA/Skills Assessment/Accuracy (%)Multimodal PISA/Video/Accuracy (%)Multimodal PISA/Video Classification/Accuracy (%)Multispectral Dataset/16k/mAP@0.5Multispectral Dataset/2D Classification/mAP@0.5Multispectral Dataset/2D Object Detection/mAP@0.5Multispectral Dataset/3D/mAP@0.5Multispectral Dataset/Object Detection/mAP@0.5Multispectral Video Semantic Segmentation/2D Semantic Segmentation/mIoUMultispectral Video Semantic Segmentation/Scene Parsing/mIoUMultispectral Video Semantic Segmentation/Scene Understanding/mIoUMultispectral Video Semantic Segmentation/Video Semantic Segmentation/mIoUMultiviewX/16k/MODAMultiviewX/16k/MODPMultiviewX/16k/RecallMultiviewX/2D Classification/MODAMultiviewX/2D Classification/MODPMultiviewX/2D Classification/RecallMultiviewX/2D Object Detection/MODAMultiviewX/2D Object Detection/MODPMultiviewX/2D Object Detection/RecallMultiviewX/3D/MODAMultiviewX/3D/MODPMultiviewX/3D/RecallMultiviewX/3D Object Detection/MODAMultiviewX/3D Object Detection/MODPMultiviewX/3D Object Detection/RecallMultiviewX/Multi-Object Tracking/IDF1MultiviewX/Multi-Object Tracking/MOTAMultiviewX/Object Detection/MODAMultiviewX/Object Detection/MODPMultiviewX/Object Detection/RecallMultiviewX/Object Tracking/IDF1MultiviewX/Object Tracking/MOTAmulti30k_test_2017_mscoco/Instance Segmentation/mask AP

Statistics

Papers
3
Benchmarks
0

Links

Homepage