Overview
- This project uses the MovieLens 25M dataset.
- Functions include data processing, exploratory analysis, and movie recommendations using machine learning.
Installation
pip install pandas numpy scikit-learn surprise plotly matplotlib dask[complete]
Data Setup
- Download MovieLens 25M from MovieLens.
- Unzip
ml-25m.zip
in the project root. Ensure CSV files are inside.
Usage
- Run
rc_final.ipynb
. - Run a single part at a time.
- Use
factor_of_data
variable to load only a subset of original MovieLens dataset. - Follow comments in the notebook for guidance.
Features
- Data merging and preprocessing.
- Exploratory analysis with basic stats and plots.
- Recommendation models using Surprise and Scikit-Learn.
- Output predictions to CSV.
Models
- Models:
- Popularity based model
- Content based model
- Collaborative Filtering
- Matrix Factorization method
- Combined model (SVD + CF)
- Hybrid model
Similarity Metrics
- Cosine similarity
- Mean square difference-based similarity.
- Pearson coefficient (mean-centred cosine similarity)
- Pearson Baseline (uses global baselines for centring instead of means)
Visualizations
- Generate plots using matplotlib and plotly (if uncommented).
License
- MIT License. See LICENSE file.