TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Tabular Data Generation using Binary Diffusion

Tabular Data Generation using Binary Diffusion

Vitaliy Kinakh, Slava Voloshynovskiy

2024-09-20Tabular Data Generation
PaperPDFCode(official)

Abstract

Generating synthetic tabular data is critical in machine learning, especially when real data is limited or sensitive. Traditional generative models often face challenges due to the unique characteristics of tabular data, such as mixed data types and varied distributions, and require complex preprocessing or large pretrained models. In this paper, we introduce a novel, lossless binary transformation method that converts any tabular data into fixed-size binary representations, and a corresponding new generative model called Binary Diffusion, specifically designed for binary data. Binary Diffusion leverages the simplicity of XOR operations for noise addition and removal and employs binary cross-entropy loss for training. Our approach eliminates the need for extensive preprocessing, complex noise parameter tuning, and pretraining on large datasets. We evaluate our model on several popular tabular benchmark datasets, demonstrating that Binary Diffusion outperforms existing state-of-the-art models on Travel, Adult Income, and Diabetes datasets while being significantly smaller in size. Code and models are available at: https://github.com/vkinakh/binary-diffusion-tabular

Results

TaskDatasetMetricValueModel
Tabular Data GenerationSICKDT Accuracy97.07Binary Diffusion
Tabular Data GenerationSICKLR Accuracy96.14Binary Diffusion
Tabular Data GenerationSICKParameters(M)1.4Binary Diffusion
Tabular Data GenerationSICKRF Accuracy96.59Binary Diffusion
Tabular Data GenerationHELOCDT Accuracy70.25Binary Diffusion
Tabular Data GenerationHELOCLR Accuracy71.76Binary Diffusion
Tabular Data GenerationHELOCParameters(M)2.6Binary Diffusion
Tabular Data GenerationHELOCRF Accuracy70.47Binary Diffusion
Tabular Data GenerationCalifornia Housing PricesDT Mean Squared Error0.45Binary Diffusion
Tabular Data GenerationCalifornia Housing PricesLR Mean Squared Error0.55Binary Diffusion
Tabular Data GenerationCalifornia Housing PricesParameters(M)1.5Binary Diffusion
Tabular Data GenerationCalifornia Housing PricesRF Mean Squared Error0.39Binary Diffusion
Tabular Data GenerationTravelDT Accuracy88.9Binary Diffusion
Tabular Data GenerationTravelLR Accuracy83.79Binary Diffusion
Tabular Data GenerationTravelParameters(M)1.1Binary Diffusion
Tabular Data GenerationTravelRF Accuracy89.95Binary Diffusion
Tabular Data GenerationDiabetesDT Accuracy0.5713Binary Diffusion
Tabular Data GenerationDiabetesLR Accuracy0.5775Binary Diffusion
Tabular Data GenerationDiabetesParameters(M)1.8Binary Diffusion
Tabular Data GenerationDiabetesRF Accuracy0.5752Binary Diffusion
Tabular Data GenerationAdult Census IncomeDT Accuracy85.27Binary Diffusion
Tabular Data GenerationAdult Census IncomeLR Accuracy85.45Binary Diffusion
Tabular Data GenerationAdult Census IncomeParameters(M)1.4Binary Diffusion
Tabular Data GenerationAdult Census IncomeRF Accuracy85.74Binary Diffusion

Related Papers

CausalDiffTab: Mixed-Type Causal-Aware Diffusion for Tabular Data Generation2025-06-17dpmm: Differentially Private Marginal Models, a Library for Synthetic Tabular Data Generation2025-05-31The Prompt is Mightier than the Example2025-05-24Graph Conditional Flow Matching for Relational Data Generation2025-05-21A Note on Statistically Accurate Tabular Data Generation Using Large Language Models2025-05-05A Comprehensive Survey of Synthetic Tabular Data Generation2025-04-23Diffusion Transformers for Tabular Data Time Series Generation2025-04-10TabRep: a Simple and Effective Continuous Representation for Training Tabular Diffusion Models2025-04-07