This is a simple project that aims to create a basic Artificial Neural Network to predict if bank customers are going to maintain/close their accounts.
The dataset used in an artificial dataset containing details about bank customers. The raw dataset file can be fetched from this folder.
This dataset is labeled and contains information about bank customers, including their credit score, geographic location, gender, age, tenure with the bank, account balance, number of products held, credit card status, membership status, estimated salary, and whether they exited the bank (presumably closed their account).
The project is implemented in two distinct steps simulating the essential data processing and analysis phases.
- Each step is represented in a corresponding notebook inside notebooks.
- Intermediary data files are stored inside the data path.
Corresponding notebook: data-preprocessing.ipynb
Implemented data exploration and cleaning tasks:
- Loading the raw dataset file into pandas DataFrame.
- Exploring dataset summary and statistics.
- Dropping irrelevant columns.
- Encoding categorical features using LabelEncoder and OneHotEncoder.
- Scaling independent features using StandardScaler.
- Storing the processed dataset to a csv file.
We used two libraries to build and train the network:
Tensorflow: Corresponding notebook: ann-training-tensorflow.ipynb
Pytorch: Corresponding notebook: ann-training-pytorch.ipynb
Tensorflow - Model architecture and training details:
- ANN using TensorFlow Sequential.
- Default input layer.
- First hidden layer with 6 units and ReLU activation function.
- Second hidden layer with 6 units and ReLU activation function.
- Output layer with a single unit and Sigmoid activation function.
- Optimizer: Adam.
- Loss function: Binary Crossentropy.
- Batch size: 32
- Epochs: 100
Pytorch - Model architecture and training details:
- Dataset loaded using Datasets & DataLoaders.
- ANN using Pytorch nn.Module.
- First hidden Linear layer with 6 units and ReLU activation function.
- Second hidden Linear layer with 6 units and ReLU activation function.
- Output Linear layer with a single unit and Sigmoid activation function.
- Optimizer: Adam.
- Loss function: Binary Crossentropy.
- Batch size: 32
- Epochs: 100
Evaluation method:
- Evaluation technique: Train Test Split.
- Test size: 20%