M2KR
Multi-task Multi-modal Knowledge Retrieval
The M2KR is a collection of datasets designed for training and evaluating general-purpose vision-language retrievers. These datasets are released in Huggingface Dataset format and cover various retrieval tasks. Let's delve into the details:
- Image to Text (I2T) retrieval: This task involves retrieving relevant textual descriptions given an input image.
- Question to Text (Q2T) retrieval: Here, the goal is to retrieve relevant text passages based on a given question.
- Image & Question to Text (IQ2T) retrieval: This task combines both image and question inputs to retrieve relevant textual information.
The M2KR benchmark comprises nine datasets, each tailored for specific tasks. Some of these datasets include:
- WIT (Web Image Text): A dataset for I2T retrieval.
- IGLUE (Image-Grounded Language Understanding Evaluation): Used for Q2T retrieval.
- KVQA (Knowledge Visual Question Answering): Relevant for IQ2T retrieval.
- CC3M (Common Crawl 3 Million): Another dataset for IQ2T retrieval.
- OVEN (Open Vision and Language Evaluation): Used in IQ2T retrieval.
- LLaVA (Large-scale Language-Visual Association): Relevant for I2T retrieval.
- OKVQA (Open Knowledge Visual Question Answering): Used in IQ2T retrieval.
- Infoseek: A dataset for I2T retrieval.
- E-VQA (English Visual Question Answering): Relevant for IQ2T retrieval.
These datasets enable researchers to develop and evaluate vision-language models, and they play a crucial role in advancing the field of multimodal understanding and retrieval¹².
(1) M2KR Benchmark Datasets - GitHub. https://github.com/LinWeizheDragon/FLMR/blob/main/docs/Datasets.md. (2) arXiv:2402.08327v1 [cs.CL] 13 Feb 2024. https://arxiv.org/pdf/2402.08327.pdf. (3) Scaling Up Fine-Grained Late-Interaction Multi-modal Retrievers. https://preflmr.github.io/. (4) undefined. https://avatars.githubusercontent.com/u/33350454?v=4. (5) undefined. https://github.com/LinWeizheDragon/FLMR/blob/main/docs/Datasets.md?raw=true. (6) undefined. https://desktop.github.com. (7) undefined. https://github.com/LinWeizheDragon/FLMR/raw/main/docs/Datasets.md.