ColonINST-v1

Introduced 2024-10-22

ColonINST is a large-scale instruction tuning dataset designed for multimodal analysis in colonoscopy. This dataset comprises 62 categories, and 303,001 colonoscopy images, including 128,620 positive and 174,381 negative cases collected from 19 publicly available datasets. We enhanced 128,620 colonoscopy images with detailed captions using a pipeline that interacts with GPT-4V through custom prompts, enriching the dataset for AI model training. We finally restructured 450,724 visual dialogues to guide the AI model through four downstream tasks critical for multimodal medical AI applications.

Please cite our work if you like it!

@article{ji2024frontiers
  author = {Ji, Ge-Peng and Liu, Jingyi and Xu, Peng and Barnes, Nick and Khan, Fahad Shahbaz and Khan, Salman and Fan, Deng-Ping},
  title = {Frontiers in Intelligent Colonoscopy},
  journal = {arXiv preprint arXiv:2410.17241},
  year = {2024}
}