TabularARGN

Tabular Auto-Regressive Generative Network

GeneralIntroduced 20001 papers

Description

Unlike synthetic data generators that rely on increasingly complex and resource-heavy architectures, TabularARGN adopts a more focused and efficient model design. These design choices result in:

  • High Fidelity: TabularARGN achieves synthetic data quality on par with state-of-the-art (SOTA) models
  • Privacy by Design: TabularARGN only considers privacy-preserving value ranges for sampling, and has built-in privacy protection features. Plus can be trained via DP-SGD for obtaining differential privacy guarantees.
  • Simplicity: TabularARGN leverages existing building blocks, and thus can be easily implemented within standard deep learning frameworks.
  • Compute Efficiency: With training speeds up to 100x faster, TabularARGN scales effectively, even for large and complex datasets.
  • Sampling Flexibility: TabularARGN supports advanced sampling capabilities, including:
    • Conditional generation to create targeted datasets.
    • Missing value imputation to handle incomplete data seamlessly.
    • Fairness adjustments to align with ethical data synthesis goals.
    • Controlling sampling probabilities via temperature adjustments to balance rule-adherence with data diversity.
  • Data Versatility: TabularARGN accommodates the heterogeneity of real-world tabular datasets, including:
    • Multi-variate, mixed-type data (categorical, numerical, date-time, geo-spatial).
    • Multi-sequence datasets with varying sequence lengths and varying time intervals.
    • Missing values.
  • Robustness in Training: TabularARGN delivers high-quality synthetic data with default settings and remains consistent across several training runs.

Papers Using This Method