Digital Forensics 2023 dataset - DF2023
The deliberate manipulation of public opinion, especially through altered images, poses a significant danger to society. To fight this issue on a technical level we support the research community by releasing the Digital Forensics 2023 (DF2023) training and validation dataset.
The DF2023 training dataset comprises one million images from four major forgery categories:
- splicing (400K)
- copy-move (300K)
- enhancement (200K)
- removal (100K)
This dataset enables an objective comparison of network architectures and can significantly reduce the time and effort of researchers preparing datasets.
For a detailed description of the DF2023 dataset, please refer to:
@inproceedings{Fischinger2023DFNet, title={DF2023: The Digital Forensics 2023 Dataset for Image Forgery Detection}, author={David Fischinger and Martin Boyer}, journal={The 25th Irish Machine Vision and Image Processing conference. (IMVIP)}, year={2023} } available from: Zenodo
Naming convention
The naming convention of DF2023 encodes information about the applied manipulations. Each image name has the following form:
COCO_DF_0123456789_NNNNNNNN.{EXT} (e.g. COCO_DF_E000G40117_00200620.jpg)
After the identifier of the image data source ("COCO") and the self-reference to the Digital Forensics ("DF") dataset, there are 10 digits as placeholders for the manipulation. Position 0 defines the manipulation types copy-move, splicing, removal, enhancement ([C,S,R,E]). The following digits 1-9 represent donor patch manipulations. For positions [1,2,7,8] (resample, flip, noise and brightness), a binary value indicates if this manipulation was applied to the donor image patch. Position 3 (rotate) indicates by the values 0-3 if the rotation was executed by 0, 90, 180 or 270 degrees. Position 4 defines if BoxBlur (B) or GaussianBlur (G) was used. Position 5 specifies the blurring radius. A value of 0 indicates that no blurring was executed. Position 6 indicates which of the Python-PIL contrast filters EDGE ENHANCE, EDGE ENHANCE MORE, SHARPEN, UnsharpMask or ImageEnhance (values 1-5) was applied. If none of them was applied, this value is set to 0. Finally, position 9 is set to the JPEG compression factor modulo 10, a value of 0 indicates that no JPEG compression was applied. The 8 characters NNNNNNNN in the image name template stand for a running number of the images.