Skip to content

MansiGit/Amazon-Gourmet-Groceries-Recommender-System

Repository files navigation

Amazon-Gourmet-Groceries-Recommender-System

Problem & Dataset

  • How to find the best recommendations for the user based on their likes and dislikes
  • Dataset : Grocery and Gourmet Food + the metadata

The steps involved are as follows :

Exploratory Data Analysis

  • The amazon dataset comes with a lot of interesting features, we have used (ASIN, ReviewerID, Rating, Category, Title, Description)
  • Average number of reviews per product 27.69
  • Average number of reviews per reviewer ~ 9 reviews
  • And 5* seems to be the most popular rating given
  • The User-item matrix has a sparsity of 0.794​

Data Preping

  • Matrix Sparsity leads to data not fitting in the RAM as we have the matrix dimensions of 127496*41320. 
  • Step 1: Drop columns which has no impact to the rating predictions: image, reviewername, summary etc.
  • Step 2: Keep those rows for which the reviewer is verified.  
  • Step 3: Remove the reviewers who have reviewed less than 20 products. 
  • Step 4: Group by the reviewer such that each unique reviewer maps to the products they reviewed, and rating provided.
  • Step 5: Use Train test split to split the processed data in the 80-20%.

Modelling

We have explored the following approaches for Rating prediction. Comparison of performance of the algorithms is done :

  1. Baseline
  2. Singular Value Decomposition (SVD)
  3. k Nearest Neighbors (kNN)
  4. Slope One
  5. Matrix Factorization

Rating Prediction Approach 1

K Nearest Neighbors:  Feature similarity to predict new data points.  User based CF: 

  • Tries to identify users with the most similar 'Interaction Profile'. 
  • Suggest items that are the most popular among these neighbors.  Item based CF: 
  • Items like the ones the user already 'positively' interacted.
  • Suggest items such that most users interact with those items. Eg milk, eggs in grocery dataset.

Rating Prediction Approach 2

Slope One (Weighted)– Additional info used in Slope One –

  • Ratings by users who have rated some common item
  • Ratings of other items by the user

Rating Prediction Approach 3

Latent Factor Model - SVD –

  • Users and items are mapped to latent factor space

  • qi – item-concept mapping for item i

  • pu – user-concept mapping for user u

  • Funk SVD was used for minimization using learning rate = 0.009, and regularization constant = 0.05

Item Recommendation

  • Step 1: Create prediction dataframe from the algorithm. Consists of reviewerid, productid and predicted rating value. 
  • Step 2: Create a nested list such that for each reviewer we have the tuple (productid, predicted_rating). 
  • Step 3: Sort the list such that we get the top 10 values for each user. 

Metrics

Future Work

  • Incorporate more features – price ranges, seasonality of products, 
  • Effect of NLP, including implicit feedback. 
  • Fairness and Unbiases of Recommender Systems (Providing users with feedback on their recommendations. This can be done by showing users the factors that contributed to a particular recommendation, or by allowing users to flag recommendations that they do not think are relevant)
  • Privacy Protection for Recommender Systems - (we can collect only the data that is necessary, also giving users control over their data and anonymization of data)

The approaches and the results are described in detail in the attached report.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •