This Repo is a fork of https://github.com/Lion-Mod/HR-Attrition which contains bug fixes to that repo and reproduces similar results.

"The output of this function call will be a number between 0 and 1 that will indicate us how similar the two tables are, being 0 the worst and 1 the best possible score."
- This is incorrect even in the given documentation example
bugs
- sdv parameter names for copulaGAN had to be updated
- the ord_feats had to be fixed
- "\r" in the raw ipynb file causes an editor crash in jupyter notebook, I removed all of them in a python script
Methodology Issues
- He used AUC to choose his first model which was lr
- Then he used AUC to choose his last model which was catboost, but he chose gbc which had the second highest AUC
  - I tried gbc with synthetic + original data and with only original data and found you get higher results with synthetic + original data
- Dataset differences
  - the file size is smaller for the dataset given compared to the kaggle ibm one that is linked.
  - Both had a dimension of (1470, 35) so I think the difference is the compression algorithm from storing the data on github

lr = logistic regression

gbc = Gradient boosting classifier

Provide feedback

Saved searches