Xinru Chen, Chengbo Dong, Jiaqi Ji, Juan Cao, Xirong Li
The key challenge of image manipulation detection is how to learn generalizable features that are sensitive to manipulations in novel data, whilst specific to prevent false alarms on authentic images. Current research emphasizes the sensitivity, with the specificity overlooked. In this paper we address both aspects by multi-view feature learning and multi-scale supervision. By exploiting noise distribution and boundary artifact surrounding tampered regions, the former aims to learn semantic-agnostic and thus more generalizable features. The latter allows us to learn from authentic images which are nontrivial to be taken into account by current semantic segmentation network based methods. Our thoughts are realized by a new network which we term MVSS-Net. Extensive experiments on five benchmark sets justify the viability of MVSS-Net for both pixel-level and image-level manipulation detection.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Image Manipulation Detection | COVERAGE | AUC | 0.733 | MVSS-Net |
| Image Manipulation Detection | COVERAGE | Balanced Accuracy | 0.514 | MVSS-Net |
| Image Manipulation Detection | Columbia | AUC | 0.984 | MVSS-Net |
| Image Manipulation Detection | Columbia | Balanced Accuracy | 0.729 | MVSS-Net |
| Image Manipulation Detection | CocoGlide | AUC | 0.654 | MVSS-Net |
| Image Manipulation Detection | CocoGlide | Balanced Accuracy | 0.117 | MVSS-Net |
| Image Manipulation Detection | DSO-1 | AUC | 0.552 | MVSS-Net |
| Image Manipulation Detection | DSO-1 | Balanced Accuracy | 0.358 | MVSS-Net |
| Image Manipulation Detection | Casia V1+ | AUC | 0.932 | MVSS-Net |
| Image Manipulation Detection | Casia V1+ | Balanced Accuracy | 0.528 | MVSS-Net |
| Video | COVERAGE | AUC | 0.733 | MVSS-Net |
| Video | COVERAGE | Balanced Accuracy | 0.514 | MVSS-Net |
| Video | Columbia | AUC | 0.984 | MVSS-Net |
| Video | Columbia | Balanced Accuracy | 0.729 | MVSS-Net |
| Video | CocoGlide | AUC | 0.654 | MVSS-Net |
| Video | CocoGlide | Balanced Accuracy | 0.117 | MVSS-Net |
| Video | DSO-1 | AUC | 0.552 | MVSS-Net |
| Video | DSO-1 | Balanced Accuracy | 0.358 | MVSS-Net |
| Video | Casia V1+ | AUC | 0.932 | MVSS-Net |
| Video | Casia V1+ | Balanced Accuracy | 0.528 | MVSS-Net |
| Temporal Action Localization | COVERAGE | AUC | 0.733 | MVSS-Net |
| Temporal Action Localization | COVERAGE | Balanced Accuracy | 0.514 | MVSS-Net |
| Temporal Action Localization | Columbia | AUC | 0.984 | MVSS-Net |
| Temporal Action Localization | Columbia | Balanced Accuracy | 0.729 | MVSS-Net |
| Temporal Action Localization | CocoGlide | AUC | 0.654 | MVSS-Net |
| Temporal Action Localization | CocoGlide | Balanced Accuracy | 0.117 | MVSS-Net |
| Temporal Action Localization | DSO-1 | AUC | 0.552 | MVSS-Net |
| Temporal Action Localization | DSO-1 | Balanced Accuracy | 0.358 | MVSS-Net |
| Temporal Action Localization | Casia V1+ | AUC | 0.932 | MVSS-Net |
| Temporal Action Localization | Casia V1+ | Balanced Accuracy | 0.528 | MVSS-Net |
| Anomaly Detection | COVERAGE | AUC | 0.733 | MVSS-Net |
| Anomaly Detection | COVERAGE | Balanced Accuracy | 0.514 | MVSS-Net |
| Anomaly Detection | Columbia | AUC | 0.984 | MVSS-Net |
| Anomaly Detection | Columbia | Balanced Accuracy | 0.729 | MVSS-Net |
| Anomaly Detection | CocoGlide | AUC | 0.654 | MVSS-Net |
| Anomaly Detection | CocoGlide | Balanced Accuracy | 0.117 | MVSS-Net |
| Anomaly Detection | DSO-1 | AUC | 0.552 | MVSS-Net |
| Anomaly Detection | DSO-1 | Balanced Accuracy | 0.358 | MVSS-Net |
| Anomaly Detection | Casia V1+ | AUC | 0.932 | MVSS-Net |
| Anomaly Detection | Casia V1+ | Balanced Accuracy | 0.528 | MVSS-Net |
| Zero-Shot Learning | COVERAGE | AUC | 0.733 | MVSS-Net |
| Zero-Shot Learning | COVERAGE | Balanced Accuracy | 0.514 | MVSS-Net |
| Zero-Shot Learning | Columbia | AUC | 0.984 | MVSS-Net |
| Zero-Shot Learning | Columbia | Balanced Accuracy | 0.729 | MVSS-Net |
| Zero-Shot Learning | CocoGlide | AUC | 0.654 | MVSS-Net |
| Zero-Shot Learning | CocoGlide | Balanced Accuracy | 0.117 | MVSS-Net |
| Zero-Shot Learning | DSO-1 | AUC | 0.552 | MVSS-Net |
| Zero-Shot Learning | DSO-1 | Balanced Accuracy | 0.358 | MVSS-Net |
| Zero-Shot Learning | Casia V1+ | AUC | 0.932 | MVSS-Net |
| Zero-Shot Learning | Casia V1+ | Balanced Accuracy | 0.528 | MVSS-Net |
| Activity Recognition | COVERAGE | AUC | 0.733 | MVSS-Net |
| Activity Recognition | COVERAGE | Balanced Accuracy | 0.514 | MVSS-Net |
| Activity Recognition | Columbia | AUC | 0.984 | MVSS-Net |
| Activity Recognition | Columbia | Balanced Accuracy | 0.729 | MVSS-Net |
| Activity Recognition | CocoGlide | AUC | 0.654 | MVSS-Net |
| Activity Recognition | CocoGlide | Balanced Accuracy | 0.117 | MVSS-Net |
| Activity Recognition | DSO-1 | AUC | 0.552 | MVSS-Net |
| Activity Recognition | DSO-1 | Balanced Accuracy | 0.358 | MVSS-Net |
| Activity Recognition | Casia V1+ | AUC | 0.932 | MVSS-Net |
| Activity Recognition | Casia V1+ | Balanced Accuracy | 0.528 | MVSS-Net |
| Action Localization | COVERAGE | AUC | 0.733 | MVSS-Net |
| Action Localization | COVERAGE | Balanced Accuracy | 0.514 | MVSS-Net |
| Action Localization | Columbia | AUC | 0.984 | MVSS-Net |
| Action Localization | Columbia | Balanced Accuracy | 0.729 | MVSS-Net |
| Action Localization | CocoGlide | AUC | 0.654 | MVSS-Net |
| Action Localization | CocoGlide | Balanced Accuracy | 0.117 | MVSS-Net |
| Action Localization | DSO-1 | AUC | 0.552 | MVSS-Net |
| Action Localization | DSO-1 | Balanced Accuracy | 0.358 | MVSS-Net |
| Action Localization | Casia V1+ | AUC | 0.932 | MVSS-Net |
| Action Localization | Casia V1+ | Balanced Accuracy | 0.528 | MVSS-Net |
| 3D Action Recognition | COVERAGE | AUC | 0.733 | MVSS-Net |
| 3D Action Recognition | COVERAGE | Balanced Accuracy | 0.514 | MVSS-Net |
| 3D Action Recognition | Columbia | AUC | 0.984 | MVSS-Net |
| 3D Action Recognition | Columbia | Balanced Accuracy | 0.729 | MVSS-Net |
| 3D Action Recognition | CocoGlide | AUC | 0.654 | MVSS-Net |
| 3D Action Recognition | CocoGlide | Balanced Accuracy | 0.117 | MVSS-Net |
| 3D Action Recognition | DSO-1 | AUC | 0.552 | MVSS-Net |
| 3D Action Recognition | DSO-1 | Balanced Accuracy | 0.358 | MVSS-Net |
| 3D Action Recognition | Casia V1+ | AUC | 0.932 | MVSS-Net |
| 3D Action Recognition | Casia V1+ | Balanced Accuracy | 0.528 | MVSS-Net |
| Action Recognition | COVERAGE | AUC | 0.733 | MVSS-Net |
| Action Recognition | COVERAGE | Balanced Accuracy | 0.514 | MVSS-Net |
| Action Recognition | Columbia | AUC | 0.984 | MVSS-Net |
| Action Recognition | Columbia | Balanced Accuracy | 0.729 | MVSS-Net |
| Action Recognition | CocoGlide | AUC | 0.654 | MVSS-Net |
| Action Recognition | CocoGlide | Balanced Accuracy | 0.117 | MVSS-Net |
| Action Recognition | DSO-1 | AUC | 0.552 | MVSS-Net |
| Action Recognition | DSO-1 | Balanced Accuracy | 0.358 | MVSS-Net |
| Action Recognition | Casia V1+ | AUC | 0.932 | MVSS-Net |
| Action Recognition | Casia V1+ | Balanced Accuracy | 0.528 | MVSS-Net |
| Image Manipulation Localization | Columbia | Average Pixel F1(Fixed threshold) | 0.729 | MVSS-Net |
| Image Manipulation Localization | COVERAGE | Average Pixel F1(Fixed threshold) | 0.514 | MVSS-Net |
| Image Manipulation Localization | Casia V1+ | Average Pixel F1(Fixed threshold) | 0.528 | MVSS-Net |
| Image Manipulation Localization | CocoGlide | Average Pixel F1(Fixed threshold) | 0.486 | MVSS-Net |
| Image Manipulation Localization | DSO-1 | Average Pixel F1(Fixed threshold) | 0.358 | MVSS-Net |