[ICLR 2025] TabWak

This is the repository for TabWak: A watermark for Tabular Diffusion Models.

The backbone model of TabWak is based on Tabsyn. Therefore, the installation and usage of TabWak are similar to Tabsyn. The following installation steps are based on Tabsyn's instructions.

Installing Dependencies

Python version: 3.10

Step 1: Create Environment

conda create -n tabsyn python=3.10
conda activate tabsyn

Step 2: Install PyTorch

Using pip:

pip install torch torchvision torchaudio

Or via conda:

conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.7 -c pytorch -c nvidia

Step 3: Install Other Dependencies

pip install -r requirements.txt

Step 4: Install Dependencies for GOGGLE

pip install dgl -f https://data.dgl.ai/wheels/cu117/repo.html
pip install torch_geometric
pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.0.1+cu117.html

Step 5: Install Quality Metric Dependencies (synthcity)

Create another environment for the quality metric:

conda create -n synthcity python=3.10
conda activate synthcity

pip install synthcity
pip install category_encoders

Preparing Datasets

Using the Datasets from the Paper

Download the raw dataset:

python download_dataset.py

Process the dataset:

python process_dataset.py

Training Models

For Tabsyn, use the following commands for training:

Train the VAE model first:

python main.py --dataname [NAME_OF_DATASET] --method vae --mode train

After the VAE is trained, train the diffusion model:

python main.py --dataname [NAME_OF_DATASET] --method tabsyn --mode train

Watermarking During Sampling

To watermark the data during the sampling process, run:

python main.py --dataname [NAME_OF_DATASET] --method tabsyn --mode sample --steps 1000 --with_w [Name_of_Watermark]

[Name_of_Watermark] options: treering, GS, TabWak, TabWak*

Watermark Detection

For watermark detection, use:

python main.py --dataname [NAME_OF_DATASET] --method tabsyn --mode detect --steps 1000 --with_w [Name_of_Watermark]

Attacks on Watermarked Data

To run attacks on watermarked data, use:

python main.py --dataname [NAME_OF_DATASET] --method tabsyn --mode detect --steps 1000 --with_w [Name_of_Watermark] --attack [Name_of_Attack_Options] --attack_percentage [0 to 1]

[Name_of_Attack_Options]: rowdeletion, celldeletion, celldeletetion, noise, shuffle

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
eval		eval
src		src
tabsyn		tabsyn
watermark		watermark
.gitignore		.gitignore
README.md		README.md
download_dataset.py		download_dataset.py
main.py		main.py
process_dataset.py		process_dataset.py
utils.py		utils.py
utils_train.py		utils_train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[ICLR 2025] TabWak

Installing Dependencies

Step 1: Create Environment

Step 2: Install PyTorch

Step 3: Install Other Dependencies

Step 4: Install Dependencies for GOGGLE

Step 5: Install Quality Metric Dependencies (synthcity)

Preparing Datasets

Using the Datasets from the Paper

Training Models

Watermarking During Sampling

Watermark Detection

Attacks on Watermarked Data

About

Releases

Packages

Languages

chaoyitud/TabWak

Folders and files

Latest commit

History

Repository files navigation

[ICLR 2025] TabWak

Installing Dependencies

Step 1: Create Environment

Step 2: Install PyTorch

Step 3: Install Other Dependencies

Step 4: Install Dependencies for GOGGLE

Step 5: Install Quality Metric Dependencies (synthcity)

Preparing Datasets

Using the Datasets from the Paper

Training Models

Watermarking During Sampling

Watermark Detection

Attacks on Watermarked Data

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages