This project implements several recommender systems using the MovieLens 1M dataset to provide personalized movie recommendations. The dataset includes over one million ratings, and the goal is to estimate the rating a user would give to a movie using different algorithms. This implementation covers naive approaches, UV matrix decomposition, and Matrix Factorization, with an emphasis on evaluating their accuracy using RMSE and MAE metrics using 5-fold cross-validation. In addition, PCA, t-SNE, and UMAP were utilized to visualize vector representations of users and movies, which were generated by matrix factorization algorithms, to get a better understanding of dataset's characteristics.
- Naive Approaches: This method leverages the overall average rating, the average rating per item, the average rating per user, and a finely tuned blend of these averages to predict unknown ratings. It's a straightforward approach that sets the foundation for more complex algorithms.
-
UV Matrix Decomposition: At its core, this technique seeks to find sparse matrices
$U$ and$V$ so as to minimize the mean squared error of$M - UV$ (for known values). To this end, we iterate through each element of$U$ and$V$ and set it to the optimal value to minimise the MSE relative to all other current values of$U$ and$V$ . - Matrix Factorization: Building on the concept of UV matrix decomposition, matrix factorization also aims to approximate the original ratings matrix through the product of two lower-dimensional matrices. However, it employs a more sophisticated optimization process, using gradient descent and regularization to refine the estimates.
- Uses PCA, t-SNE, and UMAP for reducing the dimensions of the data and visualizing the vector representations of users and movies.
- Aims to reveal patterns and clusters based on movie genres and user demographics.
Evaluation of each algorithm, emphasizing their performance in terms of RMSE and MAE metrics using 5 fold cross-validation.
Algorithm | Mean RMSE | Mean MAE |
---|---|---|
Naive Approach - Global Average | 1.423 | 0.871 |
Naive Approach - User Average | 1.155 | 0.794 |
Naive Approach - Movie Average | 1.038 | 0.751 |
Naive Approach - Linear Combo | 0.894 | 0.675 |
UV Matrix Decomposition | 0.938 | 0.654 |
Matrix Factorization | 0.848 | 0.642 |