ML Project: Fraud Detection for STEG

This repo contains the Machine Learning Project of the neuefische Data Science, Machine Learning & AI Bootcamp 2024 in Hamburg. The team members are:

Tetyana Samoylenko: https://github.com/TetyanaSam
Christian Reimann: https://github.com/christian-reimann
Jakob Koscholke: https://github.com/jottemka

Our goal is to develop a Fraud-Detection System for the The Tunisian Company of Electricity and Gas (STEG), a public and a non-administrative company, responsible for delivering electricity and gas across Tunisia. The company suffered tremendous losses in the order of 200 million Tunisian Dinars due to fraudulent manipulations of meters by consumers. Using the client’s billing history, we want to provide a data product with the following key aspects:

Goal: identify clients involved in fraudulent activities,leave non-fraudulent clients aside
Value of Product: enhance the company’s revenues by reducing the losses caused by fraudulent activities, avoid reputation damage
Evaluation Metric: ROC-AUC, but True Positive Rate and True Negative Rate will also be inspected

Data

The data for this project can be found on Zindi:

https://zindi.africa/competitions/fraud-detection-in-electricity-and-gas-consumption-challenge/data

The following column documentation was provided by the STEG. Unfortunately, it is not a very good documentation. Some columns are left unexplained, and most explanations are not helpful.

Client_id: Unique id for client
District: District where the client is
Client_catg: Category client belongs to
Region: Area where the client is
Creation_date: Date client joined
Target: fraud:1 , not fraud: 0
Invoice_date: Date of the invoice
Tarif_type: Type of tax
Counter_number:
Counter_statue: takes up to 5 values such as working fine, not working, on hold statue, ect
Counter_code:
Reading_remarque: notes that the STEG agent takes during his visit to the client (e.g: If the counter shows something wrong, the agent gives a bad score)
Counter_coefficient: An additional coefficient to be added when standard consumption is exceeded
Consommation_level_1: Consumption_level_1
Consommation_level_2: Consumption_level_2
Consommation_level_3: Consumption_level_3
Consommation_level_4: Consumption_level_4
Old_index: Old index
New_index: New index
Months_number: Month number
Counter_type: Type of counter

Tested Classifiers

Model	ROC-AUC	True Positive Rate	True Negative Rate	Elapsed Time in Seconds
"Decision Tree"	0.813529	0.824833	0.801415	6.615412
"Random Forest"	0.867407	0.792444	0.771914	52.600034
"Extra Trees"	0.862694	0.784062	0.767552	50.269403
"AdaBoost"	0.659015	0.625867	0.605927	34.564976
"LightGBM"	0.727527	0.674011	0.649917	4.244878
"XGBoost"	0.763917	0.691182	0.687353	2.807551
"CatBoost"	0.782227	0.711931	0.695365	47.220879
"Naive Bayes"	0.598696	0.967253	0.08631	1.98632
"Logistic Regression"	0.623952	0.596349	0.58452	3.465537

Excluded models due to long processing time or excessive effort:

K-Nearest Neighbors (sklearn and faiss)
Support Vector Machines
Deep Neural Networks

Cross Validation

Model	ROC-AUC	True Positive Rate	True Negative Rate	min_samples_split	min_samples_leaf	max_depth	criterion	n_jobs	n_estimators	min_child_weight	learning_rate
"DecisionTree"	0.863855	0.806765	0.797137	16	1	46	"gini"	null	null	null	null
"RandomForest"	0.847049	0.772724	0.750683	10	1	26	null	-1	225	null	null
"ExtraTrees"	0.793471	0.716113	0.708899	19	1	26	"entropy"	-1	225	null	null
"XGBoost"	0.888584	0.829137	0.778349	null	null	11	null	-1	441	1	0.406

Decision Tree: this model strikes a nice balance between a high ROC-AUC score while also making sure that in many cases fraudulent activity and non-fraudulent activity are detected as such.
XGBoost: this model achieved an even higher ROC-AUC score, compromising on the True Negative Rate

Environment Setup

macOS

For installing the virtual environment you can either use the Makefile and run make setup or install it manually with the following commands:

make setup

After that active your environment by following commands:

source .venv/bin/activate

Or install the virtual environment and the required packages by following commands:

pyenv local 3.11.3
python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

Windows

Install the virtual environment and the required packages by following commands.

For PowerShell CLI :

pyenv local 3.11.3
python -m venv .venv
.venv\Scripts\Activate.ps1
python -m pip install --upgrade pip
pip install -r requirements.txt

For Git-bash CLI :

pyenv local 3.11.3
python -m venv .venv
source .venv/Scripts/activate
python -m pip install --upgrade pip
pip install -r requirements.txt

If you encounter an error when trying to run pip install --upgrade pip, try using the following command:

python.exe -m pip install --upgrade pip

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
catboost_info		catboost_info
data		data
images		images
models		models
.gitignore		.gitignore
EDA-and-modeling_jakob.ipynb		EDA-and-modeling_jakob.ipynb
Fraud_Detection_Christian.ipynb		Fraud_Detection_Christian.ipynb
Fraud_Detection_Starter.ipynb		Fraud_Detection_Starter.ipynb
Fraud_Detection_Tetyana.ipynb		Fraud_Detection_Tetyana.ipynb
LICENSE		LICENSE
README.md		README.md
Tester.ipynb		Tester.ipynb
main.ipynb		main.ipynb
main2.ipynb		main2.ipynb
ml_project_presentation.ipynb		ml_project_presentation.ipynb
ml_project_presentation.slides.html		ml_project_presentation.slides.html
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ML Project: Fraud Detection for STEG

Data

Tested Classifiers

Cross Validation

Environment Setup

macOS

Windows

About

Releases

Packages

Contributors 3

Languages

License

jottemka/ml_project_neuefische

Folders and files

Latest commit

History

Repository files navigation

ML Project: Fraud Detection for STEG

Data

Tested Classifiers

Cross Validation

Environment Setup

macOS

Windows

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages