- Technologies Used
- Description
- Objectives
- Notebooks Overview
- Installation
- Usage
- Project Structure
- Collaborators
- License
This project is focused on building a movie recommendation system using the MovieLens dataset. The system leverages several machine learning techniques to provide personalized movie recommendations based on user preferences and past behaviors.
The main objective of this project is to develop and evaluate different recommendation algorithms, including collaborative filtering, matrix factorization, and hybrid approaches, using the MovieLens dataset. The specific steps include:
- Data Preprocessing: Filtering and preparing the dataset for analysis.
- Exploratory Data Analysis (EDA): Understanding the dataset and its underlying patterns.
- Modeling: Implementing various models like Pearson correlation, SVD, and LightFM for recommendations.
- Evaluation: Assessing the performance of the models to identify the most effective approach.
-
Dataframe_Filter.ipynb:
- This notebook is essential for preparing the dataset. It filters the raw data and generates a CSV file that is necessary for the subsequent models.
- Important: You must run this notebook first to create the CSV file that will be used by the Pearson, LightFM, and SVD models.
-
Exploratory_Data_Analysis.ipynb:
- Provides a comprehensive analysis of the dataset, including visualizations and insights into user ratings, movie genres, and other key aspects.
-
NLP_Vectorizing.ipynb:
- Applies Natural Language Processing (NLP) techniques to vectorize textual data (e.g., movie descriptions) for use in hybrid recommendation models.
-
Pearson_Correlation.ipynb:
- Implements a collaborative filtering model using Pearson correlation to recommend movies based on user similarity.
-
SVD.ipynb:
- Uses Singular Value Decomposition (SVD), a matrix factorization technique, to predict user ratings for movies.
-
New_Model_LightFM.ipynb:
- Develops a hybrid model using the LightFM library, combining both content-based and collaborative filtering approaches for recommendations.
Important
The project was developed and tested on Python 3.11.6
To run this project locally, follow these steps:
- Clone the repository:
git clone https://github.com/jcrigoni/grand_ml_project
cd Movie-Recommendation-System
- Install requirements:
pip install -r requirements.txt
Important
LightFM needs OpenMp to run multithreading which could be difficult on Windows or Macos. In that case it's better to use the docker version of LightFM.
- Run the Dataframe_Filter.ipynb notebook to create the necessary CSV file and used movie.csv and rating.csv on https://www.kaggle.com/datasets/grouplens/movielens-20m-dataset .
- After running the first notebook, you can proceed to run the other notebooks to explore the data, build models, and generate recommendations.
TIP: Some notebooks may take a while to run depending on the dataset size and complexity of the model. Please be patient!
📦 grand_ml_project/
├── 📁Data/
│ ├── 🐍Dataframe_Filter.ipynb
│ ├── 🐍Exploratory_Data_Analysis.ipynb
│ └── 🐍NLP_Vectorizing.ipynb
├── 📁Models/
│ ├── 🐍New_Model_LightFM.ipynb
│ ├── 🐍Pearson_Correlation.ipynb
│ ├── 🐍SVD.ipynb
│ ├── 🖼️banner.png
│ └── 📁Exported_Models/
│ └── 🗃️lightfm_recommendation_model.pkl
├── 📄requirements.txt
├── 📄README.md
├── 📄Project-Documentation_Movie_Recommendation_System_Kallel_Rigoni_Rodner.pdf
└── 📄.gitignore
This project was developed by a collaborative team. Each member played a crucial role in the research, development, and analysis:
- Mohamed Kallel
- Jean Christophe Rigoni
- Simon Pierre Rodner
This project is under the CC BY-NC 4.0 License. For more information, refer to the license file.