LADMIM: Logical Anomaly Detection with Masked Image Modeling in Discrete Latent Space

Shunsuke Sakai, Tatushito Hasegawa, Makoto Koshino

2024-10-14Self-Supervised Learning Anomaly Detection

Abstract

Detecting anomalies such as incorrect combinations of objects or deviations in their positions is a challenging problem in industrial anomaly detection. Traditional methods mainly focus on local features of normal images, such as scratches and dirt, making detecting anomalies in the relationships between features difficult. Masked image modeling(MIM) is a self-supervised learning technique that predicts the feature representation of masked regions in an image. To reconstruct the masked regions, it is necessary to understand how the image is composed, allowing the learning of relationships between features within the image. We propose a novel approach that leverages the characteristics of MIM to detect logical anomalies effectively. To address blurriness in the reconstructed image, we replace pixel prediction with predicting the probability distribution of discrete latent variables of the masked regions using a tokenizer. We evaluated the proposed method on the MVTecLOCO dataset, achieving an average AUC of 0.867, surpassing traditional reconstruction-based and distillation-based methods.

Results

Task	Dataset	Metric	Value	Model
Anomaly Detection	MVTec LOCO AD	Avg. Detection AUROC	86	LADMIM
Anomaly Detection	MVTec LOCO AD	Detection AUROC (only logical)	83.1	LADMIM
Anomaly Detection	MVTec LOCO AD	Detection AUROC (only structural)	90.3	LADMIM

Related Papers

Multi-Stage Prompt Inference Attacks on Enterprise LLM Systems2025-07-21 A Semi-Supervised Learning Method for the Identification of Bad Exposures in Large Imaging Surveys2025-07-17 3DKeyAD: High-Resolution 3D Point Cloud Anomaly Detection via Keypoint-Guided Point Clustering2025-07-17 A Privacy-Preserving Framework for Advertising Personalization Incorporating Federated Learning and Differential Privacy2025-07-16 Bridge Feature Matching and Cross-Modal Alignment with Mutual-filtering for Zero-shot Anomaly Detection2025-07-15 Self-supervised Learning on Camera Trap Footage Yields a Strong Universal Face Embedder2025-07-14 Adversarial Activation Patching: A Framework for Detecting and Mitigating Emergent Deception in Safety-Aligned Transformers2025-07-12 Towards High-Resolution 3D Anomaly Detection: A Scalable Dataset and Real-Time Framework for Subtle Industrial Defects2025-07-10