Myung-Joon Kwon, Seung-Hun Nam, In-Jae Yu, Heung-Kyu Lee, Changick Kim
Detecting and localizing image manipulation are necessary to counter malicious use of image editing techniques. Accordingly, it is essential to distinguish between authentic and tampered regions by analyzing intrinsic statistics in an image. We focus on JPEG compression artifacts left during image acquisition and editing. We propose a convolutional neural network (CNN) that uses discrete cosine transform (DCT) coefficients, where compression artifacts remain, to localize image manipulation. Standard CNNs cannot learn the distribution of DCT coefficients because the convolution throws away the spatial coordinates, which are essential for DCT coefficients. We illustrate how to design and train a neural network that can learn the distribution of DCT coefficients. Furthermore, we introduce Compression Artifact Tracing Network (CAT-Net) that jointly uses image acquisition artifacts and compression artifacts. It significantly outperforms traditional and deep neural network-based methods in detecting and localizing tampered regions.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Image Manipulation Detection | COVERAGE | AUC | 0.68 | CAT-Net v2 |
| Image Manipulation Detection | COVERAGE | Balanced Accuracy | 0.635 | CAT-Net v2 |
| Image Manipulation Detection | Columbia | AUC | 0.977 | CAT-Net v2 |
| Image Manipulation Detection | Columbia | Balanced Accuracy | 0.803 | CAT-Net v2 |
| Image Manipulation Detection | CocoGlide | AUC | 0.667 | CAT-Net v2 |
| Image Manipulation Detection | CocoGlide | Balanced Accuracy | 0.58 | CAT-Net v2 |
| Image Manipulation Detection | DSO-1 | AUC | 0.747 | CAT-Net v2 |
| Image Manipulation Detection | DSO-1 | Balanced Accuracy | 0.525 | CAT-Net v2 |
| Image Manipulation Detection | Casia V1+ | AUC | 0.942 | CAT-Net v2 |
| Image Manipulation Detection | Casia V1+ | Balanced Accuracy | 0.838 | CAT-Net v2 |
| Video | COVERAGE | AUC | 0.68 | CAT-Net v2 |
| Video | COVERAGE | Balanced Accuracy | 0.635 | CAT-Net v2 |
| Video | Columbia | AUC | 0.977 | CAT-Net v2 |
| Video | Columbia | Balanced Accuracy | 0.803 | CAT-Net v2 |
| Video | CocoGlide | AUC | 0.667 | CAT-Net v2 |
| Video | CocoGlide | Balanced Accuracy | 0.58 | CAT-Net v2 |
| Video | DSO-1 | AUC | 0.747 | CAT-Net v2 |
| Video | DSO-1 | Balanced Accuracy | 0.525 | CAT-Net v2 |
| Video | Casia V1+ | AUC | 0.942 | CAT-Net v2 |
| Video | Casia V1+ | Balanced Accuracy | 0.838 | CAT-Net v2 |
| Temporal Action Localization | COVERAGE | AUC | 0.68 | CAT-Net v2 |
| Temporal Action Localization | COVERAGE | Balanced Accuracy | 0.635 | CAT-Net v2 |
| Temporal Action Localization | Columbia | AUC | 0.977 | CAT-Net v2 |
| Temporal Action Localization | Columbia | Balanced Accuracy | 0.803 | CAT-Net v2 |
| Temporal Action Localization | CocoGlide | AUC | 0.667 | CAT-Net v2 |
| Temporal Action Localization | CocoGlide | Balanced Accuracy | 0.58 | CAT-Net v2 |
| Temporal Action Localization | DSO-1 | AUC | 0.747 | CAT-Net v2 |
| Temporal Action Localization | DSO-1 | Balanced Accuracy | 0.525 | CAT-Net v2 |
| Temporal Action Localization | Casia V1+ | AUC | 0.942 | CAT-Net v2 |
| Temporal Action Localization | Casia V1+ | Balanced Accuracy | 0.838 | CAT-Net v2 |
| Anomaly Detection | COVERAGE | AUC | 0.68 | CAT-Net v2 |
| Anomaly Detection | COVERAGE | Balanced Accuracy | 0.635 | CAT-Net v2 |
| Anomaly Detection | Columbia | AUC | 0.977 | CAT-Net v2 |
| Anomaly Detection | Columbia | Balanced Accuracy | 0.803 | CAT-Net v2 |
| Anomaly Detection | CocoGlide | AUC | 0.667 | CAT-Net v2 |
| Anomaly Detection | CocoGlide | Balanced Accuracy | 0.58 | CAT-Net v2 |
| Anomaly Detection | DSO-1 | AUC | 0.747 | CAT-Net v2 |
| Anomaly Detection | DSO-1 | Balanced Accuracy | 0.525 | CAT-Net v2 |
| Anomaly Detection | Casia V1+ | AUC | 0.942 | CAT-Net v2 |
| Anomaly Detection | Casia V1+ | Balanced Accuracy | 0.838 | CAT-Net v2 |
| Zero-Shot Learning | COVERAGE | AUC | 0.68 | CAT-Net v2 |
| Zero-Shot Learning | COVERAGE | Balanced Accuracy | 0.635 | CAT-Net v2 |
| Zero-Shot Learning | Columbia | AUC | 0.977 | CAT-Net v2 |
| Zero-Shot Learning | Columbia | Balanced Accuracy | 0.803 | CAT-Net v2 |
| Zero-Shot Learning | CocoGlide | AUC | 0.667 | CAT-Net v2 |
| Zero-Shot Learning | CocoGlide | Balanced Accuracy | 0.58 | CAT-Net v2 |
| Zero-Shot Learning | DSO-1 | AUC | 0.747 | CAT-Net v2 |
| Zero-Shot Learning | DSO-1 | Balanced Accuracy | 0.525 | CAT-Net v2 |
| Zero-Shot Learning | Casia V1+ | AUC | 0.942 | CAT-Net v2 |
| Zero-Shot Learning | Casia V1+ | Balanced Accuracy | 0.838 | CAT-Net v2 |
| Activity Recognition | COVERAGE | AUC | 0.68 | CAT-Net v2 |
| Activity Recognition | COVERAGE | Balanced Accuracy | 0.635 | CAT-Net v2 |
| Activity Recognition | Columbia | AUC | 0.977 | CAT-Net v2 |
| Activity Recognition | Columbia | Balanced Accuracy | 0.803 | CAT-Net v2 |
| Activity Recognition | CocoGlide | AUC | 0.667 | CAT-Net v2 |
| Activity Recognition | CocoGlide | Balanced Accuracy | 0.58 | CAT-Net v2 |
| Activity Recognition | DSO-1 | AUC | 0.747 | CAT-Net v2 |
| Activity Recognition | DSO-1 | Balanced Accuracy | 0.525 | CAT-Net v2 |
| Activity Recognition | Casia V1+ | AUC | 0.942 | CAT-Net v2 |
| Activity Recognition | Casia V1+ | Balanced Accuracy | 0.838 | CAT-Net v2 |
| Action Localization | COVERAGE | AUC | 0.68 | CAT-Net v2 |
| Action Localization | COVERAGE | Balanced Accuracy | 0.635 | CAT-Net v2 |
| Action Localization | Columbia | AUC | 0.977 | CAT-Net v2 |
| Action Localization | Columbia | Balanced Accuracy | 0.803 | CAT-Net v2 |
| Action Localization | CocoGlide | AUC | 0.667 | CAT-Net v2 |
| Action Localization | CocoGlide | Balanced Accuracy | 0.58 | CAT-Net v2 |
| Action Localization | DSO-1 | AUC | 0.747 | CAT-Net v2 |
| Action Localization | DSO-1 | Balanced Accuracy | 0.525 | CAT-Net v2 |
| Action Localization | Casia V1+ | AUC | 0.942 | CAT-Net v2 |
| Action Localization | Casia V1+ | Balanced Accuracy | 0.838 | CAT-Net v2 |
| 3D Action Recognition | COVERAGE | AUC | 0.68 | CAT-Net v2 |
| 3D Action Recognition | COVERAGE | Balanced Accuracy | 0.635 | CAT-Net v2 |
| 3D Action Recognition | Columbia | AUC | 0.977 | CAT-Net v2 |
| 3D Action Recognition | Columbia | Balanced Accuracy | 0.803 | CAT-Net v2 |
| 3D Action Recognition | CocoGlide | AUC | 0.667 | CAT-Net v2 |
| 3D Action Recognition | CocoGlide | Balanced Accuracy | 0.58 | CAT-Net v2 |
| 3D Action Recognition | DSO-1 | AUC | 0.747 | CAT-Net v2 |
| 3D Action Recognition | DSO-1 | Balanced Accuracy | 0.525 | CAT-Net v2 |
| 3D Action Recognition | Casia V1+ | AUC | 0.942 | CAT-Net v2 |
| 3D Action Recognition | Casia V1+ | Balanced Accuracy | 0.838 | CAT-Net v2 |
| Action Recognition | COVERAGE | AUC | 0.68 | CAT-Net v2 |
| Action Recognition | COVERAGE | Balanced Accuracy | 0.635 | CAT-Net v2 |
| Action Recognition | Columbia | AUC | 0.977 | CAT-Net v2 |
| Action Recognition | Columbia | Balanced Accuracy | 0.803 | CAT-Net v2 |
| Action Recognition | CocoGlide | AUC | 0.667 | CAT-Net v2 |
| Action Recognition | CocoGlide | Balanced Accuracy | 0.58 | CAT-Net v2 |
| Action Recognition | DSO-1 | AUC | 0.747 | CAT-Net v2 |
| Action Recognition | DSO-1 | Balanced Accuracy | 0.525 | CAT-Net v2 |
| Action Recognition | Casia V1+ | AUC | 0.942 | CAT-Net v2 |
| Action Recognition | Casia V1+ | Balanced Accuracy | 0.838 | CAT-Net v2 |
| Image Manipulation Localization | Columbia | Average Pixel F1(Fixed threshold) | 0.859 | CAT-Net v2 |
| Image Manipulation Localization | Columbia(Protocol-CAT) | Pixel Binary F1 | 0.915 | CAT-Net |
| Image Manipulation Localization | NIST16(Protocol-CAT) | Pixel Binary F1 | 0.252 | CAT-Net |
| Image Manipulation Localization | CASIAv1(Protoclo-CAT) | Pixel Binary F1 | 0.808 | CAT-Net |
| Image Manipulation Localization | COVERAGE | Average Pixel F1(Fixed threshold) | 0.381 | CAT-Net v2 |
| Image Manipulation Localization | COVERAGE(Protocol-CAT) | Pixel Binary F1 | 0.427 | CAT-Net |
| Image Manipulation Localization | Casia V1+ | Average Pixel F1(Fixed threshold) | 0.752 | CAT-Net v2 |
| Image Manipulation Localization | CocoGlide | Average Pixel F1(Fixed threshold) | 0.434 | CAT-Net v2 |
| Image Manipulation Localization | DSO-1 | Average Pixel F1(Fixed threshold) | 0.584 | CAT-Net v2 |