TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Modeling Tabular data using Conditional GAN

Modeling Tabular data using Conditional GAN

Lei Xu, Maria Skoularidou, Alfredo Cuesta-Infante, Kalyan Veeramachaneni

2019-07-01NeurIPS 2019 12Tabular Data Generation
PaperPDFCodeCodeCodeCodeCodeCodeCode(official)CodeCode

Abstract

Modeling the probability distribution of rows in tabular data and generating realistic synthetic data is a non-trivial task. Tabular data usually contains a mix of discrete and continuous columns. Continuous columns may have multiple modes whereas discrete columns are sometimes imbalanced making the modeling difficult. Existing statistical and deep neural network models fail to properly model this type of data. We design TGAN, which uses a conditional generative adversarial network to address these challenges. To aid in a fair and thorough comparison, we design a benchmark with 7 simulated and 8 real datasets and several Bayesian network baselines. TGAN outperforms Bayesian methods on most of the real datasets whereas other deep learning methods could not.

Results

TaskDatasetMetricValueModel
Tabular Data GenerationSICKDT Accuracy95.39TVAE
Tabular Data GenerationSICKLR Accuracy94.7TVAE
Tabular Data GenerationSICKParameters(M)0.046TVAE
Tabular Data GenerationSICKRF Accuracy94.91TVAE
Tabular Data GenerationSICKDT Accuracy93.77CopulaGAN
Tabular Data GenerationSICKLR Accuracy94.57CopulaGAN
Tabular Data GenerationSICKParameters(M)0.226CopulaGAN
Tabular Data GenerationSICKRF Accuracy94.57CopulaGAN
Tabular Data GenerationSICKDT Accuracy92.05CTGAN
Tabular Data GenerationSICKLR Accuracy94.44CTGAN
Tabular Data GenerationSICKParameters(M)0.222CTGAN
Tabular Data GenerationSICKRF Accuracy94.57CTGAN
Tabular Data GenerationHELOCDT Accuracy76.39TVAE
Tabular Data GenerationHELOCLR Accuracy71.04TVAE
Tabular Data GenerationHELOCParameters(M)62TVAE
Tabular Data GenerationHELOCRF Accuracy77.24TVAE
Tabular Data GenerationHELOCDT Accuracy61.34CTGAN
Tabular Data GenerationHELOCLR Accuracy57.72CTGAN
Tabular Data GenerationHELOCParameters(M)0.277CTGAN
Tabular Data GenerationHELOCRF Accuracy62.35CTGAN
Tabular Data GenerationHELOCDT Accuracy42.36CopulaGAN
Tabular Data GenerationHELOCLR Accuracy42.03CopulaGAN
Tabular Data GenerationHELOCParameters(M)0.276CopulaGAN
Tabular Data GenerationHELOCRF Accuracy42.35CopulaGAN
Tabular Data GenerationCalifornia Housing PricesDT Mean Squared Error0.45TVAE
Tabular Data GenerationCalifornia Housing PricesLR Mean Squared Error0.65TVAE
Tabular Data GenerationCalifornia Housing PricesParameters(M)0.045TVAE
Tabular Data GenerationCalifornia Housing PricesRF Mean Squared Error0.35TVAE
Tabular Data GenerationCalifornia Housing PricesDT Mean Squared Error0.82CTGAN
Tabular Data GenerationCalifornia Housing PricesLR Mean Squared Error0.61CTGAN
Tabular Data GenerationCalifornia Housing PricesParameters(M)0.197CTGAN
Tabular Data GenerationCalifornia Housing PricesRF Mean Squared Error0.62CTGAN
Tabular Data GenerationCalifornia Housing PricesDT Mean Squared Error1.19CopulaGAN
Tabular Data GenerationCalifornia Housing PricesLR Mean Squared Error0.98CopulaGAN
Tabular Data GenerationCalifornia Housing PricesParameters(M)0.201CopulaGAN
Tabular Data GenerationCalifornia Housing PricesRF Mean Squared Error0.99CopulaGAN
Tabular Data GenerationTravelDT Accuracy81.68TVAE
Tabular Data GenerationTravelLR Accuracy79.58TVAE
Tabular Data GenerationTravelParameters(M)0.036TVAE
Tabular Data GenerationTravelRF Accuracy81.68TVAE
Tabular Data GenerationTravelDT Accuracy73.61CopulaGAN
Tabular Data GenerationTravelLR Accuracy73.3CopulaGAN
Tabular Data GenerationTravelParameters(M)0.157CopulaGAN
Tabular Data GenerationTravelRF Accuracy73.3CopulaGAN
Tabular Data GenerationTravelDT Accuracy73.3CTGAN
Tabular Data GenerationTravelLR Accuracy73.3CTGAN
Tabular Data GenerationTravelParameters(M)0.155CTGAN
Tabular Data GenerationTravelRF Accuracy71.41CTGAN
Tabular Data GenerationDiabetesDT Accuracy0.533TVAE
Tabular Data GenerationDiabetesLR Accuracy0.5634TVAE
Tabular Data GenerationDiabetesParameters(M)0.359TVAE
Tabular Data GenerationDiabetesRF Accuracy0.5517TVAE
Tabular Data GenerationDiabetesDT Accuracy0.4973CTGAN
Tabular Data GenerationDiabetesLR Accuracy0.5093CTGAN
Tabular Data GenerationDiabetesParameters(M)9.6CTGAN
Tabular Data GenerationDiabetesRF Accuracy0.5223CTGAN
Tabular Data GenerationDiabetesDT Accuracy0.385CopulaGAN
Tabular Data GenerationDiabetesLR Accuracy0.4027CopulaGAN
Tabular Data GenerationDiabetesParameters(M)9.4CopulaGAN
Tabular Data GenerationDiabetesRF Accuracy0.3759CopulaGAN
Tabular Data GenerationAdult Census IncomeDT Accuracy82.8TVAE
Tabular Data GenerationAdult Census IncomeLR Accuracy80.53TVAE
Tabular Data GenerationAdult Census IncomeParameters(M)0.053TVAE
Tabular Data GenerationAdult Census IncomeRF Accuracy83.48TVAE
Tabular Data GenerationAdult Census IncomeDT Accuracy81.32CTGAN
Tabular Data GenerationAdult Census IncomeLR Accuracy83.2CTGAN
Tabular Data GenerationAdult Census IncomeParameters(M)0.302CTGAN
Tabular Data GenerationAdult Census IncomeRF Accuracy83.53CTGAN
Tabular Data GenerationAdult Census IncomeDT Accuracy76.29CopulaGAN
Tabular Data GenerationAdult Census IncomeLR Accuracy80.61CopulaGAN
Tabular Data GenerationAdult Census IncomeParameters(M)0.3CopulaGAN
Tabular Data GenerationAdult Census IncomeRF Accuracy80.46CopulaGAN

Related Papers

CausalDiffTab: Mixed-Type Causal-Aware Diffusion for Tabular Data Generation2025-06-17dpmm: Differentially Private Marginal Models, a Library for Synthetic Tabular Data Generation2025-05-31The Prompt is Mightier than the Example2025-05-24Graph Conditional Flow Matching for Relational Data Generation2025-05-21A Note on Statistically Accurate Tabular Data Generation Using Large Language Models2025-05-05A Comprehensive Survey of Synthetic Tabular Data Generation2025-04-23Diffusion Transformers for Tabular Data Time Series Generation2025-04-10TabRep: a Simple and Effective Continuous Representation for Training Tabular Diffusion Models2025-04-07