This project focuses on conducting Exploratory Data Analysis (EDA) and Feature Engineering on the Black Friday dataset. As a beginner, I aimed to apply foundational data science techniques, from cleaning and understanding the dataset to preparing it for model training.
The Black Friday dataset contains customer purchase details. The main objective of this project is to:
- Explore the dataset to identify patterns and insights.
- Engineer features that could help in building predictive models.
- Prepare the data for training machine learning models.
-
Data Loading and Preprocessing:
- The dataset was loaded using pandas.
- Missing values were handled by separating rows with missing purchase values into test and train sets.
-
Exploratory Data Analysis (EDA):
- Descriptive statistics were performed to understand the dataset.
- Visualizations such as bar plots were used to explore purchase trends and customer demographics.
-
Feature Engineering:
- Unnecessary columns, like
Product_ID
, were dropped from the feature set. - Numerical features were scaled using StandardScaler to prepare for machine learning models.
- Unnecessary columns, like
-
Train-Test Split:
- The data was split into training and test sets using
train_test_split
fromsklearn
. - The target variable was the
Purchase
column, and all other columns were used as features.
- The data was split into training and test sets using
-
Feature Scaling:
- Implemented scaling of the features to standardize the data, which is essential for certain machine learning models.
- The dataset is ready to be used for training machine learning models.
pandas
numpy
matplotlib
seaborn
scikit-learn
- Clone the repository:
git clone https://github.com/YashsTiwari/BlackFriday-EDA-and-Feature-Engineering.git
- Install the required libraries:
pip install -r requirements.txt
- Run the Jupyter notebook to view the analysis.