MIMIC-IT
ImagesTextsMIT licenseIntroduced 2023-05-05
MultI-Modal In-Context Instruction Tuning (MIMIC-IT) is a dataset for instruction tuning into multi-modal models, motivated by the Flamingo model's upstream interleaved format pretraining dataset. The data sample consists of a queried image-instruction-answer triplet, with the instruction-answer tailored to the image, and context. The context contains a series of image-instruction-answer triplets that contextually correlate with the queried triplet, emulating the relationship between the context and the queried image-text pair found in the MMC4 dataset.
Source: Otter: A Multi-Modal Model with In-Context Instruction Tuning
Image Source: Otter: A Multi-Modal Model with In-Context Instruction Tuning