SuSy Dataset
ImagesMultiple licenses (see description)Introduced 2024-09-21
The SuSy Dataset combines authentic photographs and AI-generated images designed for training and evaluating synthetic image detection models. It contains over 25,000 images from six different sources, including real-world photographs from COCO and synthetic images created by state-of-the-art diffusion models such as DALL-E 3, Midjourney, and Stable Diffusion.
Authentic Images
- COCO (Common Objects in Context): A large-scale object detection, segmentation, and captioning dataset. It includes over 330,000 images, with 200,000 labeled using 80 object categories. For this dataset, we use a random subset of 5,435 images. License: Creative Commons Attribution 4.0 license
Synthetic Images
- dalle-3-images: Contains 3,310 unique images generated using DALL-E 3. The dataset does not include the prompts used to generate the images. License: MIT license
- diffusiondb: A large-scale text-to-image prompt dataset containing 14 million images generated by Stable Diffusion 1.x series models (2022). We use a random subset of 5,435 images. License:** CC0 1.0 Universal license
- realisticSDXL: Contains images generated using the Stable Diffusion XL (SDXL) model released in July 2023. We use only the "realistic" category, which contains 5,435 images. License: CreativeML OpenRAIL-M license
- midjourney-tti: Contains images generated using Midjourney V1 or V2 models (early 2022). The original dataset provided URLs, which were scraped to obtain the images. License: CC0 1.0 Universal license (for links only, images are property of users who generated them)
- midjourney-images: Contains 4,308 unique images generated using Midjourney V5 and V6 models (2023). License: MIT license