TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Unsupervised Anomaly Detection for Auditing Data and Impac...

Unsupervised Anomaly Detection for Auditing Data and Impact of Categorical Encodings

Ajay Chawda, Stefanie Grimm, Marius Kloft

2022-10-25Density EstimationUnsupervised Anomaly DetectionAnomaly DetectionContrastive Learning
PaperPDFCode(official)

Abstract

In this paper, we introduce the Vehicle Claims dataset, consisting of fraudulent insurance claims for automotive repairs. The data belongs to the more broad category of Auditing data, which includes also Journals and Network Intrusion data. Insurance claim data are distinctively different from other auditing data (such as network intrusion data) in their high number of categorical attributes. We tackle the common problem of missing benchmark datasets for anomaly detection: datasets are mostly confidential, and the public tabular datasets do not contain relevant and sufficient categorical attributes. Therefore, a large-sized dataset is created for this purpose and referred to as Vehicle Claims (VC) dataset. The dataset is evaluated on shallow and deep learning methods. Due to the introduction of categorical attributes, we encounter the challenge of encoding them for the large dataset. As One Hot encoding of high cardinal dataset invokes the "curse of dimensionality", we experiment with GEL encoding and embedding layer for representing categorical attributes. Our work compares competitive learning, reconstruction-error, density estimation and contrastive learning approaches for Label, One Hot, GEL encoding and embedding layer to handle categorical values.

Results

TaskDatasetMetricValueModel
Anomaly DetectionVehicle ClaimsAUC98.65Random Forest
Anomaly DetectionVehicle ClaimsAUC95.88Gradient Boosting
Anomaly DetectionVehicle ClaimsAUC65.43SOM
Anomaly DetectionVehicle ClaimsAUC59.42Isolation Forest
Anomaly DetectionVehicle ClaimsAUC58.59Latent Outlier Exposure
Anomaly DetectionVehicle ClaimsAUC57.03NeuTraL-AD
Anomaly DetectionVehicle ClaimsAUC55.38RSRAE
Anomaly DetectionVehicle ClaimsAUC53.82SOM-DAGMM
Anomaly DetectionVehicle ClaimsAUC52.86Local Outlier Factor
Anomaly DetectionVehicle ClaimsAUC51.68One Class Support Vector Machines
Anomaly DetectionVehicle ClaimsAUC51.22DAGMM
Unsupervised Anomaly DetectionVehicle ClaimsAUC65.43SOM
Unsupervised Anomaly DetectionVehicle ClaimsAUC59.42Isolation Forest
Unsupervised Anomaly DetectionVehicle ClaimsAUC58.59Latent Outlier Exposure
Unsupervised Anomaly DetectionVehicle ClaimsAUC57.03NeuTraL-AD
Unsupervised Anomaly DetectionVehicle ClaimsAUC55.38RSRAE
Unsupervised Anomaly DetectionVehicle ClaimsAUC53.82SOM-DAGMM
Unsupervised Anomaly DetectionVehicle ClaimsAUC52.86Local Outlier Factor
Unsupervised Anomaly DetectionVehicle ClaimsAUC51.68One Class Support Vector Machines
Unsupervised Anomaly DetectionVehicle ClaimsAUC51.22DAGMM

Related Papers

Missing value imputation with adversarial random forests -- MissARF2025-07-21Multi-Stage Prompt Inference Attacks on Enterprise LLM Systems2025-07-213DKeyAD: High-Resolution 3D Point Cloud Anomaly Detection via Keypoint-Guided Point Clustering2025-07-17A Semi-Supervised Learning Method for the Identification of Bad Exposures in Large Imaging Surveys2025-07-17SemCSE: Semantic Contrastive Sentence Embeddings Using LLM-Generated Summaries For Scientific Abstracts2025-07-17HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals2025-07-17Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management2025-07-17SGCL: Unifying Self-Supervised and Supervised Learning for Graph Recommendation2025-07-17