dpmm: Differentially Private Marginal Models, a Library for Synthetic Tabular Data Generation
Sofiane Mahiou, Amir Dizche, Reza Nazari, Xinmin Wu, Ralph Abbey, Jorge Silva, Georgi Ganev
Abstract
We propose dpmm, an open-source library for synthetic data generation with Differentially Private (DP) guarantees. It includes three popular marginal models -- PrivBayes, MST, and AIM -- that achieve superior utility and offer richer functionality compared to alternative implementations. Additionally, we adopt best practices to provide end-to-end DP guarantees and address well-known DP-related vulnerabilities. Our goal is to accommodate a wide audience with easy-to-install, highly customizable, and robust model implementations. Our codebase is available from https://github.com/sassoftware/dpmm.
Related Papers
Lightweight Safety Guardrails via Synthetic Data and RL-guided Adversarial Training2025-07-11DocIE@XLLM25: In-Context Learning for Information Extraction using Fully Synthetic Demonstrations2025-07-08How Good Are Synthetic Requirements ? Evaluating LLM-Generated Datasets for AI4RE2025-06-26SoK: Can Synthetic Images Replace Real Data? A Survey of Utility and Privacy of Synthetic Image Generation2025-06-24PuckTrick: A Library for Making Synthetic Data More Realistic2025-06-23RoboMonkey: Scaling Test-Time Sampling and Verification for Vision-Language-Action Models2025-06-21Latent Noise Injection for Private and Statistically Aligned Synthetic Data Generation2025-06-19CausalDiffTab: Mixed-Type Causal-Aware Diffusion for Tabular Data Generation2025-06-17