TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/zGAN: An Outlier-focused Generative Adversarial Network Fo...

zGAN: An Outlier-focused Generative Adversarial Network For Realistic Synthetic Data Generation

Azizjon Azimi, Bonu Boboeva, Ilyas Varshavskiy, Shuhrat Khalilbekov, Akhlitdin Nizamitdinov, Najima Noyoftova, Sergey Shulgin

2024-10-28Synthetic Data GenerationBinary ClassificationSynthetic Data Evaluation
PaperPDF

Abstract

The phenomenon of "black swans" has posed a fundamental challenge to performance of classical machine learning models. The perceived rise in frequency of outlier conditions, especially in post-pandemic environment, has necessitated exploration of synthetic data as a complement to real data in model training. This article provides a general overview and experimental investigation of the zGAN model architecture developed for the purpose of generating synthetic tabular data with outlier characteristics. The model is put to test in binary classification environments and shows promising results on realistic synthetic data generation, as well as uplift capabilities vis-\`a-vis model performance. A distinctive feature of zGAN is its enhanced correlation capability between features in the generated data, replicating correlations of features in real training data. Furthermore, crucial is the ability of zGAN to generate outliers based on covariance of real data or synthetically generated covariances. This approach to outlier generation enables modeling of complex economic events and augmentation of outliers for tasks such as training predictive models and detecting, processing or removing outliers. Experiments and comparative analyses as part of this study were conducted on both private (credit risk in financial services) and public datasets.

Results

TaskDatasetMetricValueModel
Synthetic Data GenerationTitanicAUC0.8163zGAN
Synthetic Data GenerationTitanicAUC0.8076CopulaGAN
Synthetic Data GenerationTitanicAUC0.7923CTGAN
Synthetic Data GenerationTitanicAUC0.7874TVAE
Synthetic Data GenerationTitanicAUC0.7861SynthPop
Synthetic Data GenerationTitanicAUC0.7846Gaussian Copula
Synthetic Data GenerationTitanicAUC0.534PrivBayes
Synthetic Data GenerationA9 (7.4% outliers)AUC0.7122zGAN
Synthetic Data GenerationA9 (3% outliers)AUC0.7116zGAN
Synthetic Data GenerationA9 (5% outliers)AUC0.7147zGAN

Related Papers

Lightweight Safety Guardrails via Synthetic Data and RL-guided Adversarial Training2025-07-11An Automated Classifier of Harmful Brain Activities for Clinical Usage Based on a Vision-Inspired Pre-trained Framework2025-07-10DocIE@XLLM25: In-Context Learning for Information Extraction using Fully Synthetic Demonstrations2025-07-08DDL: A Dataset for Interpretable Deepfake Detection and Localization in Real-World Scenarios2025-06-29How Good Are Synthetic Requirements ? Evaluating LLM-Generated Datasets for AI4RE2025-06-26Inverse Scene Text Removal2025-06-26Divide, Specialize, and Route: A New Approach to Efficient Ensemble Learning2025-06-25SoK: Can Synthetic Images Replace Real Data? A Survey of Utility and Privacy of Synthetic Image Generation2025-06-24