Welcome to the official implementation of SKADA-Bench: Benchmarking Unsupervised Domain Adaptation Methods with Realistic Validation
To reproduce the results in this paper, you have three options:
- Visualize stored results (Step 3): with results in
/visualize/cleaned_outputs/
. - Run bench (Step 2): with config files in
/config/datasets
. - Start from Scratch (Step 1): Generate new config files and proceed from there.
We provided all the necessary files for running each step without the need for running the previous one.
If you want to add new solvers, dataset or scorers just follow the instructions in the CONTRIBUTE.md file.
Note: Current implementation reproduces results from the paper, with minor variations in some cases. Expect minor code adjustments in coming weeks to improve reproducibility. A revised paper version with near-exact reproducibility is also planned.
Note: This project requires Python 3.10.
To install the necessary requirements to run a benchmark, use the following commands:
- Ensure you have Python 3.10 installed. You can check your Python version with:
python --version
- Install the benchopt library:
pip install benchopt==1.6.0
- Install the desired datasets and solvers using
benchopt
. Specify the dataset and solver you want to use (e.g.,simulated
andbci
solver):
benchopt install . [--download]
Note: The --download
flag is optional but highly recommended. It pre-downloads the datasets, which is particularly useful in the following scenarios:
- When working on large clusters where internet access might be limited on computing nodes.
- To avoid multiple processes attempting to download data simultaneously.
- To ensure data is properly loaded when installing the benchmark.
- [NOT MANDATORY] Install the preprocessing - visualising - all requirements:
pip install -r preprocessing/requirements_preprocess.txt # Install preprocessing dependencies
pip install -r visualize/requirements_plot.txt # Install plotting dependencies
pip install -r requirements_all.txt # Install all dependencies
Generate the config file for selecting base estimator on source:
python benchmark_utils/generate_config/generate_base_estim_config.py
This generates config/find_best_base_estimators_per_dataset.yml
.
Run base estimator experiments and store the results:
benchopt run --config config/find_best_base_estimators_per_dataset.yml --output base_estimators/results_base_estimators --no-plot --no-html
This generates outputs/base_estimators/results_base_estimators
.
Extract the results and store them in a CSV file results_base_estimators/
:
python visualize/convert_benchopt_output_to_readable_csv.py --domain source --directory outputs/base_estimators --output results_base_estimators --file_name results_base_estim_experiments
This generates results_base_estimators/results_base_estim_experiments.csv
.
Find the best base estimator per dataset and store them in config/best_base_estimators.yml
:
python benchmark_utils/extract_best_base_estim.py
This generates config/best_base_estimators.yml
.
Update the config file per dataset with the best base estimator:
python benchmark_utils/generate_config/generate_config_per_dataset.py
This generates a config file for each dataset in config/datasets/
.
To launch the benchmark for each dataset, run the following command:
benchopt run --config dataset.yml --timeout 3h --output output_directory/output_dataset --no-plot --no-html
dataset.yml
: Config file of the specified dataset.output_directory
: Name of the output directory (real_datasets
orsimulated_datasets
depending on your data)output_dataset
: Name of the output result parquet/csv.
benchopt run --config config/datasets/Simulated.yml --timeout 3h --output simulated_datasets/output_simulated --no-plot --no-html
Note: In the paper results, the timeout was set to 3 hours for shallow methods and 24 hours for deep methods. The
benchopt
framework supports running benchmarks in parallel on a SLURM cluster. For more details, refer to the Benchopt user guide.
Convert the benchopt
output into a CSV format:
python visualize/convert_benchopt_output_to_readable_csv.py --directory outputs/simulated_datasets --domain target --file_name output_readable_dataset
This generates visualize/cleaned_outputs/output_readable_dataset.csv
. This csv file can then be used by anyone to plot the benchmarking results.
In the visualize
folder, run the following commands to generate various results and plots:
- Main Result Table (Shallow):|
python plot_results_all_datasets.py --csv-file cleaned_outputs/results_real_datasets_experiments.csv --csv-file-simulated cleaned_outputs/results_simulated_datasets_experiments.csv
- Individual Tables per Dataset:
python plot_results_per_dataset.py --csv-file cleaned_outputs/results_real_datasets_experiments.csv --dataset BCI
- Cross-val Score vs. Accuracy for Different Scorers:
python plot_inner_score_vs_acc.py --csv-file cleaned_outputs/results_real_datasets_experiments.csv
- Accuracy of DA Methods using Unsupervised Scorers vs. Supervised Scorers:
python plot_supervised_vs_unsupervised.py --csv-file cleaned_outputs/results_real_datasets_experiments.csv
- Change in Accuracy of DA Methods with Best Unsupervised Scorer vs. Supervised Scorer:
python plot_boxplot.py --csv-file cleaned_outputs/results_real_datasets_experiments.csv
- Main Result Table (Deep):
python plot_results_all_datasets_deep.py --csv-folder cleaned_outputs/ --scorer-selection unsupervised
- Mean Computing Time for Training and Testing Each Method:
python visualize/get_computational_time.py --directory outputs
All the generated tables and plots can be found in the visualize
folder.
Note: For the
get_computational_time
script, you need to give directly benchopt outputs which are not provided due to size limits (all other results are provided).
Happy benchmarking!