Papers With Code 2 | ML Benchmarks, SotA Results & Code

IVM-Mix-1M provide over 1M image-instruction pairs with corresponding instruction-relevant mask labels. Our IVM-Mix-1M dataset consists of three part: HumanLabelData, RobotMachineData and VQAMachineData. For the HumanLabelData and RobotMachineData, we provide well-orgnized images, mask label and language instructions. For the VQAMachineData, we only provide mask label and language instructions, please refer to https://huggingface.co/datasets/2toINF/IVM-Mix-1M and download the images from constituting datasets.