Differentially private synthesizers for tabular data. Package includes:
- MWEM
- QUAIL
- DP-CTGAN
- PATE-CTGAN
- PATE-GAN
pip install smartnoise-synth
import pandas as pd
import numpy as np
pums = pd.read_csv(pums_csv_path, index_col=None) # in datasets/
pums = pums.drop(['income'], axis=1)
nf = pums.to_numpy().astype(int)
synth = snsynth.MWEMSynthesizer(epsilon=1.0, split_factor=nf.shape[1])
synth.fit(nf)
sample = synth.sample(10)
print(sample)
import pandas as pd
import numpy as np
from snsynth.pytorch.nn import DPCTGAN
from snsynth.pytorch import PytorchDPSynthesizer
pums = pd.read_csv(pums_csv_path, index_col=None) # in datasets/
pums = pums.drop(['income'], axis=1)
synth = PytorchDPSynthesizer(1.0, DPCTGAN(), None)
synth.fit(pums, categorical_columns=pums.columns)
sample = synth.sample(10) # synthesize 10 rows
print(sample)
import pandas as pd
import numpy as np
from snsynth.pytorch.nn import PATECTGAN
from snsynth.pytorch import PytorchDPSynthesizer
pums = pd.read_csv(pums_csv_path, index_col=None) # in datasets/
pums = pums.drop(['income'], axis=1)
synth = PytorchDPSynthesizer(1.0, PATECTGAN(regularization='dragan'), None)
synth.fit(pums, categorical_columns=pums.columns)
sample = synth.sample(10) # synthesize 10 rows
print(sample)
MWEM, DP-CTGAN, and PATE-CTGAN require columns to be categorical. If you have columns with continuous values, you should discretize them before fitting. Take care to discretize in a way that does not reveal information about the distribution of the data.
- You are encouraged to join us on GitHub Discussions
- Please use GitHub Issues for bug reports and feature requests.
- For other requests, including security issues, please contact us at smartnoise@opendp.org.
Please let us know if you encounter a bug by creating an issue.
We appreciate all contributions. Please review the contributors guide. We welcome pull requests with bug-fixes without prior discussion.
If you plan to contribute new features, utility functions or extensions to this system, please first open an issue and discuss the feature with us.