Papers With Code 2 | ML Benchmarks, SotA Results & Code

MultI-Modal In-Context Instruction Tuning (MIMIC-IT) is a dataset for instruction tuning into multi-modal models, motivated by the Flamingo model's upstream interleaved format pretraining dataset. The data sample consists of a queried image-instruction-answer triplet, with the instruction-answer tailored to the image, and context. The context contains a series of image-instruction-answer triplets that contextually correlate with the queried triplet, emulating the relationship between the context and the queried image-text pair found in the MMC4 dataset.

Source: Otter: A Multi-Modal Model with In-Context Instruction Tuning

Image Source: Otter: A Multi-Modal Model with In-Context Instruction Tuning