Skip to content


Folders and files

Last commit message
Last commit date

Latest commit



7 Commits

Repository files navigation

Fairness Under Demographic Scarce Regime

This repository contains the code for the paper Fairness Under Demographic Scarce Regime (FairDSR). Demographic Scarce Regime refers to settings where demographic information (sensitive attribute) is not fully available. The paper studies the properties of the sensitive attribute classifier that can affect the fairness-accuracy tradeoffs of the downstream classifier. The paper demonstrates that applying fairness constraints on samples with a lower uncertainty in the sensitive attributes provides better results in terms of fairness-accuracy tradeoffs.


The project requires the following Python packages:

  • numpy
  • pandas
  • h5py
  • mpi4py
  • scikit-learn
  • tensorflow
  • folktables
  • fairlearn
  • tensorboard
  • torchvision

other dependencies a located in requiments.txt file

Experiments on fair classification with different sensitive attributes baselines


Download each dataset and store them (preprocessed) in the folder preprocessing. For each dataset create two different csv files for each subsets: $\mathcal{D}_1$ (dataset without sensitive attributes) and $\mathcal{D}_2$ (with sensitive attributes) as described in the paper. The file src/ contains code to load each dataset separated in subsets $\mathcal{D}_1$ and $\mathcal{D}_2$. We provide preprocessed version of the Adult dataset.

Quick demo using conformal predictions

Run the following code to train the fair model using samples with low uncertainty; with uncertainty measured using conformal predictions.

python3 src/ --sensitive_feature_type=cp --cp_alpha 0.05 --seed=1  --dataset adult --base_model lr  --fair_metric dp 

Sensitive attribute classifier with uncertainty awareness

The file src/ contains code to train the attribute classifier with uncertainty estimation.

Train fair model with predicted sensitive attributes

The file src/ contains code to train and evaluate fair models with different attribute classifier baselines (proxies).

On your local machine.

Assuming nbr_core is the number of core you want to use:

cd src
mpiexec -n nbr_core python

On a HPC Cluster

You will have to provide the number of cores in your submission file and srun will use all the core available:

cd src
srun python


  • dataset (string): Specify the dataset to be used: adult, compas_race, new_adult, lsac_sex, celeba_attract.
  • seed (int): Random seed.
  • fair_metric (dp, eodds, eop): Fairness metric
    • dp: Demographic Parity
    • eop: Equal Opportunity
    • eodds: Equalized Odds
  • demographic_predictor (DNN, KNN): Model used to infer the sensitive attribute from the related feature. This parameter is used when sensitive_feature_type is predicted. Possible values DNN and KNN.
    • DNN use MLP based attribute classifier
    • KNN use KNN based attribute classifier (imputation)
  • is_adv_method (boolean): Whether to use adversarial debaising method.
  • base_model (string): Base classifier for reduction methods.
    • lr: LogisticRegression
    • rf: RandomForest
    • gbm: GradientBoostingClassifier
  • sensitive_feature_type (string): Use clean or predicted sensitive features.
    • clean: apply fairness mechanism w.r.t ground truth sensitive attribute
    • ours: apply fairness mechanism w.r.t mostly certain predicted sensitive attributes
    • cp: apply fairness mechanism w.r.t mostly certain predicted sensitive attributes using conformal predictions.
    • predicted: apply fairness mechanism w.r.t MLP or KNN based attribute classifier.
  • cp_alpha: Coverage of the prediction set used in conformal prediction.

Other baselines with fairness constraints

Use the file src/ to train the target classifier with fairness mechanisms not supported by fairlearn.

python src/ --dataset $DATASET --baseline $BASELINE


  • baseline (ARL, DRO, CVAR, VANILLA, FAIRDA): specify the baseline to use to train the target classifier.
    • ARL: train the classifier with Adversarially Reweighted Learning (ARL) by Lahoti et al. (2020.).
    • DRO: train the classifier with robust loss; distributionally robust optimization (DRO) by Hashimoto et al. (2018).
    • CVAR: train the classifier with robust loss and KL-regularized (fast DRO) by Levy et al. (2020).
    • FAIRDA: train the classifier with FAIRDA by Liang, Yueqing, et al.
    • VANILLA: train the classifier without fairness constraints.
  • dataset (string): specify the dataset to be used: adult, compas_race, new_adult, lsac_sex, celeba_attract.

Baselines without fairness constraints

Use file src/ to run the baselines without fairness constraints. It uses the parameters --base_model, --dataset, and --seed as described above.

Reproducing the analysis in the paper

Assuming that you have done all the experiments for every baseline, the results for each baseline and each seed are stored in the folder output. To aggregate the results across seeds, run the file analysis/ with the argument --dataset specifying the dataset.

Creating the plots.

The notebook analysis/plots.ipynb has functions to plot the results in the paper. The plots are saved in the folder analysis/results/plots.

Ablation on uncertainty thresholds

For this experiment, use the file src/ to train fair classifier for different uncertainty thresholds by setting the parameter --treshold_uncert to define the confidence threshold. The results for each baseline and for each seed are stored in the folder output/{dataset}/ablation and for each dataset specified with the parameter --dataset as mentioned above.


If you use (parts of) this code, please cite:

title={Fairness Under Demographic Scarce Regime},
author={Patrik Joslin Kenfack and Samira Ebrahimi Kahou and Ulrich A{\"\i}vodji},
journal={Transactions on Machine Learning Research},


No description, website, or topics provided.







No releases published


No packages published