Papers With Code 2 | ML Benchmarks, SotA Results & Code

Unlike synthetic data generators that rely on increasingly complex and resource-heavy architectures, TabularARGN adopts a more focused and efficient model design. These design choices result in:

High Fidelity: TabularARGN achieves synthetic data quality on par with state-of-the-art (SOTA) models
Privacy by Design: TabularARGN only considers privacy-preserving value ranges for sampling, and has built-in privacy protection features. Plus can be trained via DP-SGD for obtaining differential privacy guarantees.
Simplicity: TabularARGN leverages existing building blocks, and thus can be easily implemented within standard deep learning frameworks.
Compute Efficiency: With training speeds up to 100x faster, TabularARGN scales effectively, even for large and complex datasets.
Sampling Flexibility: TabularARGN supports advanced sampling capabilities, including:
- Conditional generation to create targeted datasets.
- Missing value imputation to handle incomplete data seamlessly.
- Fairness adjustments to align with ethical data synthesis goals.
- Controlling sampling probabilities via temperature adjustments to balance rule-adherence with data diversity.
Data Versatility: TabularARGN accommodates the heterogeneity of real-world tabular datasets, including:
- Multi-variate, mixed-type data (categorical, numerical, date-time, geo-spatial).
- Multi-sequence datasets with varying sequence lengths and varying time intervals.
- Missing values.
Robustness in Training: TabularARGN delivers high-quality synthetic data with default settings and remains consistent across several training runs.

TabularARGN

Description

Papers Using This Method