ColonINST-v1 (Unseen)

ColonINST is a large-scale instruction tuning dataset designed for multimodal analysis in colonoscopy. This dataset comprises 62 categories, 303,001 colonoscopy images, including 128,620 positive and 174,381 negative cases collected from 19 publicly available datasets. We enhanced 128,620 colonoscopy images with detailed captions using a pipeline that interacts with GPT-4V through custom prompts, enriching the dataset for AI model training. We finally restructured 450,724 visual dialogues to guide the AI model through four downstream tasks critical for multimodal medical AI applications.