Skip to content

Latest commit

 

History

History
151 lines (135 loc) · 3.13 KB

README.md

File metadata and controls

151 lines (135 loc) · 3.13 KB

This Repo is a fork of https://github.com/Lion-Mod/HR-Attrition which contains bug fixes to that repo and reproduces similar results.

https://sdv.dev/SDV/user_guides/evaluation/evaluation_framework.html

  • "The output of this function call will be a number between 0 and 1 that will indicate us how similar the two tables are, being 0 the worst and 1 the best possible score."
    • This is incorrect even in the given documentation example
  • bugs
    • sdv parameter names for copulaGAN had to be updated
    • the ord_feats had to be fixed
    • "\r" in the raw ipynb file causes an editor crash in jupyter notebook, I removed all of them in a python script
  • Methodology Issues
    • He used AUC to choose his first model which was lr
    • Then he used AUC to choose his last model which was catboost, but he chose gbc which had the second highest AUC
      • I tried gbc with synthetic + original data and with only original data and found you get higher results with synthetic + original data
    • Dataset differences
      • the file size is smaller for the dataset given compared to the kaggle ibm one that is linked.
      • Both had a dimension of (1470, 35) so I think the difference is the compression algorithm from storing the data on github
data Classifier Accuracy AUC Recall Precision F1 Kappa MCC
Original lr 0.8794 0.8534 0.4463 0.7006 0.5388 0.4746 0.4934
Original + synth lr 0.8971 0.9564 0.8420 0.9512 0.8562 0.7964 0.8200
Original gbc 0.8686 0.8195 0.3140 0.7010 0.4233 0.3648 0.4056
Original + synth gbc 0.8971 0.9564 0.8420 0.9512 0.8562 0.7964 0.8200

lr = logistic regression

gbc = Gradient boosting classifier