ObjectFormer for Image Manipulation Detection and Localization

Junke Wang, Zuxuan Wu, Jingjing Chen, Xintong Han, Abhinav Shrivastava, Ser-Nam Lim, Yu-Gang Jiang

2022-03-28CVPR 2022 1Image Manipulation Localization Image Manipulation Image Manipulation Detection

Abstract

Recent advances in image editing techniques have posed serious challenges to the trustworthiness of multimedia data, which drives the research of image tampering detection. In this paper, we propose ObjectFormer to detect and localize image manipulations. To capture subtle manipulation traces that are no longer visible in the RGB domain, we extract high-frequency features of the images and combine them with RGB features as multimodal patch embeddings. Additionally, we use a set of learnable object prototypes as mid-level representations to model the object-level consistencies among different regions, which are further used to refine patch embeddings to capture the patch-level consistencies. We conduct extensive experiments on various datasets and the results verify the effectiveness of the proposed method, outperforming state-of-the-art tampering detection and localization methods.

Results

Task	Dataset	Metric	Value	Model
Image Manipulation Localization	Columbia(Protocol-CAT)	Pixel Binary F1	0.732	ObjectFormer
Image Manipulation Localization	NIST16(Protocol-CAT)	Pixel Binary F1	0.252	ObjectFormer
Image Manipulation Localization	CASIAv1(Protoclo-CAT)	Pixel Binary F1	0.531	ObjectFormer
Image Manipulation Localization	COVERAGE(Protocol-CAT)	Pixel Binary F1	0.257	ObjectFormer

Related Papers

Beyond Fully Supervised Pixel Annotations: Scribble-Driven Weakly-Supervised Framework for Image Manipulation Localization2025-07-17 Towards Reliable Identification of Diffusion-based Image Manipulations2025-06-05 UniWorld-V1: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation2025-06-03 Weakly-supervised Localization of Manipulated Image Regions Using Multi-resolution Learned Features2025-05-29 RBench-V: A Primary Assessment for Visual Reasoning Models with Multi-modal Outputs2025-05-22 My Face Is Mine, Not Yours: Facial Protection Against Diffusion Model Face Swapping2025-05-21 Visual Agentic Reinforcement Fine-Tuning2025-05-20 Emerging Properties in Unified Multimodal Pretraining2025-05-20