Skip to content

Fraud Detection System for STEG. ML project for the neuefische Data Science, Machine Learning & AI Bootcamp 2024 in Hamburg.

License

Notifications You must be signed in to change notification settings

jottemka/ml_project_neuefische

Repository files navigation

Shipping files

ML Project: Fraud Detection for STEG

alt text

This repo contains the Machine Learning Project of the neuefische Data Science, Machine Learning & AI Bootcamp 2024 in Hamburg. The team members are:

Our goal is to develop a Fraud-Detection System for the The Tunisian Company of Electricity and Gas (STEG), a public and a non-administrative company, responsible for delivering electricity and gas across Tunisia. The company suffered tremendous losses in the order of 200 million Tunisian Dinars due to fraudulent manipulations of meters by consumers. Using the client’s billing history, we want to provide a data product with the following key aspects:

  • Goal: identify clients involved in fraudulent activities,leave non-fraudulent clients aside
  • Value of Product: enhance the company’s revenues by reducing the losses caused by fraudulent activities, avoid reputation damage
  • Evaluation Metric: ROC-AUC, but True Positive Rate and True Negative Rate will also be inspected

Data

The data for this project can be found on Zindi:

The following column documentation was provided by the STEG. Unfortunately, it is not a very good documentation. Some columns are left unexplained, and most explanations are not helpful.

  1. Client_id: Unique id for client
  2. District: District where the client is
  3. Client_catg: Category client belongs to
  4. Region: Area where the client is
  5. Creation_date: Date client joined
  6. Target: fraud:1 , not fraud: 0
  7. Invoice_date: Date of the invoice
  8. Tarif_type: Type of tax
  9. Counter_number:
  10. Counter_statue: takes up to 5 values such as working fine, not working, on hold statue, ect
  11. Counter_code:
  12. Reading_remarque: notes that the STEG agent takes during his visit to the client (e.g: If the counter shows something wrong, the agent gives a bad score)
  13. Counter_coefficient: An additional coefficient to be added when standard consumption is exceeded
  14. Consommation_level_1: Consumption_level_1
  15. Consommation_level_2: Consumption_level_2
  16. Consommation_level_3: Consumption_level_3
  17. Consommation_level_4: Consumption_level_4
  18. Old_index: Old index
  19. New_index: New index
  20. Months_number: Month number
  21. Counter_type: Type of counter

Tested Classifiers

ModelROC-AUCTrue Positive RateTrue Negative RateElapsed Time in Seconds
"Decision Tree"0.8135290.8248330.8014156.615412
"Random Forest"0.8674070.7924440.77191452.600034
"Extra Trees"0.8626940.7840620.76755250.269403
"AdaBoost"0.6590150.6258670.60592734.564976
"LightGBM"0.7275270.6740110.6499174.244878
"XGBoost"0.7639170.6911820.6873532.807551
"CatBoost"0.7822270.7119310.69536547.220879
"Naive Bayes"0.5986960.9672530.086311.98632
"Logistic Regression"0.6239520.5963490.584523.465537

Excluded models due to long processing time or excessive effort:

  1. K-Nearest Neighbors (sklearn and faiss)
  2. Support Vector Machines
  3. Deep Neural Networks

Cross Validation

ModelROC-AUCTrue Positive RateTrue Negative Ratemin_samples_splitmin_samples_leafmax_depthcriterionn_jobsn_estimatorsmin_child_weightlearning_rate
"DecisionTree"0.8638550.8067650.79713716146"gini"nullnullnullnull
"RandomForest"0.8470490.7727240.75068310126null-1225nullnull
"ExtraTrees"0.7934710.7161130.70889919126"entropy"-1225nullnull
"XGBoost"0.8885840.8291370.778349nullnull11null-144110.406
  • Decision Tree: this model strikes a nice balance between a high ROC-AUC score while also making sure that in many cases fraudulent activity and non-fraudulent activity are detected as such.

  • XGBoost: this model achieved an even higher ROC-AUC score, compromising on the True Negative Rate

Environment Setup

macOS

For installing the virtual environment you can either use the Makefile and run make setup or install it manually with the following commands:

make setup

After that active your environment by following commands:

source .venv/bin/activate

Or install the virtual environment and the required packages by following commands:

pyenv local 3.11.3
python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

Windows

Install the virtual environment and the required packages by following commands.

For PowerShell CLI :

pyenv local 3.11.3
python -m venv .venv
.venv\Scripts\Activate.ps1
python -m pip install --upgrade pip
pip install -r requirements.txt

For Git-bash CLI :

pyenv local 3.11.3
python -m venv .venv
source .venv/Scripts/activate
python -m pip install --upgrade pip
pip install -r requirements.txt

If you encounter an error when trying to run pip install --upgrade pip, try using the following command:

python.exe -m pip install --upgrade pip

About

Fraud Detection System for STEG. ML project for the neuefische Data Science, Machine Learning & AI Bootcamp 2024 in Hamburg.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •