IDRCell-100k
We introduce the IDRCell100K image dataset, a collection of biological images, purposefully curated from the extensive and varied Image Data Resource platform. Our selection, based on metadata provided with these experiments, covered various microscopy techniques to encapsulate a diverse array of imaging modalities, ensuring the dataset's breadth in representing biological information. Efforts were made to minimize experimental and imaging biases, striving for a balanced representation up to a feasible extent, thereby reducing dependency on each image modality or experiment.
To create a well-rounded dataset, we focused on cell culture experiments from the Image Data Resource. We picked 79 distinct experiments conducted under different conditions and for different scientific purposes. These experiments employed 7 types of microscopy techniques and fell into 6 categories of study.
As the number of images differ from one experiment to the other, we carefully chose 1,300 images from each selected experiment, in order to keep the final dataset balanced. These images come from experiments using different methods and include a wide range of channels monitoring for various components of the cells. Altogether, we end up with 308,898 single channel images, which we resized to 224x224 pixels from a variety of original sizes. When combined, it resulted to 104,093 multiplexed microscopy images containing cells at various scales, with each image made from one to up to 10 different channels.