Skip to content

A team work of a capstone project. The correlation between features were discovered. cost and store sales can be predicted by other store features with random forest regressor and multiple linear regression model. Customer membership card can be predicted with random forest classifier and deep learning models. Tableau was used for visualization.

Notifications You must be signed in to change notification settings

CelineWW/FoodMart_Media_Campaign

Repository files navigation

FoodMart_Media_Campaign

This is a team work of a capstone project.

Overview

  • This project collected a demographic information of customer data as well as store, product, and promotion data. SQL and AWS database were used to store and query the data. Raw data was divdided into multiple tables and integrated according to column properties.
  • To discover customer segment distributions, the factors of media cost and store sales, Tableau public was used to analyze and visualize the results. - random forest classification, deep learning, random forest regressor and multiple linear regression model were applied to explore the correlation between member card, occupation, cost, store sales and other features, and make predictions.
    • Having tested several models, RandomForest Classifier & RandomForest Regression proved the best performers for our dataset.
    • We explored grouping & aggregating variables in our ML models to positive effect
    • KMeans was not viable. Our dataset refused to cluster
  • a flask app of webpage library was built to summarize and display the team work.

Data Source

From kaggle dataset:

https://www.kaggle.com/datasets/ramjasmaurya/medias-cost-prediction-in-foodmart?select=media+prediction+and+its+cost.csv

Why this data source was chosen and what result were expecting to get out?

Cost Prediction sounded interested since we are looking for a data source related to accounting (profit, revenue, overhead, marketing cost, how to attract new customers to increase sales). The description of this data source “Predict cost on Media campaigns in food Mart of USA” sounded very instated and had the characteristics we were looking for. After reviewing the contents of the data source, here are some of the questions we wanted to answer by the data source:

  • How many stores there are?
  • In which cities and states these stores are?
  • Do costumer demographics link to membership card tiers?
  • Is the cost of sales promotions linked to customers demographics?
  • What is media cost, by state, by store type, by food department?
  • Which are top media promotions, the top unit of sales?
  • What are the sales, by department, store, food category?

Tools

  • SQL
  • Tableau
  • AWS
  • Javascript
  • Machine Learning through Python
  • Flask App

Results

ERD_SQL_AWS

Final_ERD

Machine Learning

  1. Data Preprocessing
  • In preprocessing we elected to Use LabelEncoder for the majority of our models to limit expansion of our dataset. Alternate encoders, such as get_dummies and OneHotEncoder, would have resulted in a combined dataset of over 300 potential features. Additionally we experimented with grouping and aggregating the data to positive effect. For example- grouping by physical store characteristic columns such as store square footage to identify distinct stores and then aggregating certain metrics of those stores to find average promotion costs and average store sales.
  1. Feature engineering & selection
  • features were categorized by Product, Store, Promotion, Customer & Location. Several of these categories were analyzed individually as well as in combination to investigate feature importance by different categories. Questions such as:
    • Can customer demographics predict Membership Card Tier?
    • What Store features are most important for predicting store sales?
    • Do different stores prefer certain Promotions (promotion_name)?
    • Can the cost of a saLes promotion be predicted by customer demographic features?
  1. Description of how data was split into training and testing sets
  • using sklearn's train_test_split function, the data was split at 75% training data and 25% test data.
  1. Explanation of model choice, including limitations and benefits
  • Final ML models were RandomForestClassifier, RandomForestRegressor, & LinearRegression. When RandomForestClassifier & LinearRegression were not viable, RandomForestRegression was very useful for evaluation of continuous variables.
  1. Explanation of changes in model choice (if changes occurred between the Segment 2 and Segment 3 deliverables)
  • Our Deep learning Nearal Network was dropped in favor of RandomForestClassifier which outperformed the Neural network in terms of accuracy and was quicker to execute.
  1. How was the model was trained or retrained
  • RandomForestClassifier models were retrained by increasing the quantity of decision trees which had a positive effect on accuracy & RMSE scores.
  1. Description and explanation of model’s confusion matrix, including final accuracy score
  • RandomForestClassifier modeling across all membership card types (Gold, Silver, Bronze, and Normal) was 94%. As displayed by the following Confusion Matrix, recall was lowest for the Silver Card category. This may be explained by the relatively small size of the Silver Card cohort, compared to the other Membership Card categories. Screen Shot 2022-11-02 at 10 12 33 PM

Screen Shot 2022-11-02 at 10 14 42 PM

  • Using RandomForestRegressor to analyze Agrocery store's profits based on the stores physical attributes retured a Root Means Square Error of 0.223 indicating that store features have strong predective value when analysing revenue. Screen Shot 2022-11-02 at 10 21 04 PM

  • It was also observed that the geographical location of these stores was an important feature for predicting the Cost of a sales promotion.

    • With City & State Screen Shot 2022-11-02 at 10 26 17 PM

    • Without City & State Screen Shot 2022-11-02 at 10 27 19 PM

Webpage Overview

Check out our webpage here: https://foodmart-app.herokuapp.com/ Attention: Heroku shut down its free tier since 11/28/2022. So Webpage is currently not accessible on Heroku. Sorry for the inconvenience.

01_Homepage

03_filter search

04_SQL

05_heatmap

07_machine learning_bubble chart

07_maching learning code

08_tableau_store

09_tableau_customer

11_summary

10_our team

Tableau Visualization and results

  1. Customer Demographics

https://public.tableau.com/app/profile/carlos2209/viz/FoodMartDemographics/DemographicsStoryForFinal?publish=yes

How many memberships are there?

Screen Shot 2022-10-31 at 7 39 06 PM

Who makes up these memberships?

Screen Shot 2022-10-31 at 7 39 40 PM

Where do they shop? Screen Shot 2022-10-31 at 7 40 10 PM

  1. Food Mart Media Cost, Promotions and Sales

https://public.tableau.com/app/profile/hilda.vazqez/viz/FoodMartMediaCostPromotionsandSales_16672708652530/FoodMartMediaCostPromorionsandSales?publish=yes

  • Media cost by Store Type:

Supermarkets are the costliest but also where more customers shop and in consequence the highest on sales.

image

  • Promotions Cost and Sales Results by Store

There are 49 different types of promotions but in general the costliest in total for all Store Types is the Weekend Markdown. However, this varies by store type:

 . Supermarket – Weekend Markdown with a total cost of $160,008    
 . Deluxe Supermarket - Price Savers with a total cost of $101,706    
 . Gourmet Supermarket – Tow Day Sale with a total cost of $62,223    
 . Small Grocery – Sales Galore with a total cost of $7,774

The cost or promotion will not determine the sales results; however Weekend Markdown has the highest sales in total for all Stores Types

. Supermarket – Weekend Markdown with a store total sale of $11,000,000
. Deluxe Supermarket – Cash Register Lottery with a store total sale of $10,000,000
. Gourmet Supermarket – Two Day Sale with a store total sale of $3,000,000
. Small Grocery – Two Day Sale with a store total sale of $510,000

image

  • Media Cost and Total Sales by Food Department

Produce is the one with the highest cost of media but also Produce is the one with the highest sales

. Produce cost of media $535, 956
 . Produce total sale $36, 516,000

image

  • Sales by Media Type

“Daily paper, radio” has the highest sales, followed by “Daily Paper". Media type could be more cost efficient combined then buying it individually with still good sales results.

 . "Daily paper,radio" total sales $32,669,000

image

  • Sales by City and State

This data set is presenting Food Mart metrics in the states of Washington, Oregon, and California. Here are the total sales for each city and state:

image

image

Summary

- Membership Card tiers are strongly linked with customer demographics. 
     With deep learning model, we can predict Membership Card with accuracy of 0.970. The model here is binary classification. Deep Neural Network MultiClass Classification will be a better choice for this prediction.
     With random forest model, prediction accuracy can be up to 0.982. 
     Advertising of Membership Card tiers should be targeted via segmentation.

- Media Cost can also be predicted by customer demographics with random forest regressor (RMSE: 0.637).  
     Owners & Investors should pay close attention the makeup of their customers when projecting Sales Promotion budgets.
- For media type, Daily Paper advertisements are strongly correlated to store sales, 
     thus newspaper promotions are highly recommended.
- Supermarkets have the most of customers & spend more money on advertisements (highest Media Cost by store type).
     Weekend Markdown is the most effective way to stimulate customers to shop in Supermarkets & 
     Supermarket also invests the most money in their Weekend Markdown promotions.
- Store characteristics (Square Feet, Store Cost, Coffee Bar, Video Store, etc.) are strong predictors of Store Sales. 
     Investor should expect to invest in a grocery store’s physical features, amenities if they want to maximize store sales.

About

A team work of a capstone project. The correlation between features were discovered. cost and store sales can be predicted by other store features with random forest regressor and multiple linear regression model. Customer membership card can be predicted with random forest classifier and deep learning models. Tableau was used for visualization.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published