This repository contains the development of a recommendation system using a movie dataset. It stores information in PostgreSQL and utilizes Python to build a machine learning model that suggests movies based on user preferences, leveraging PostgreSQL, Pandas, and Scikit-learn.
-
Develop an efficient recommendation system for movies based on user preferences.
-
Utilize collaborative and content-based filtering techniques.
-
Store and process data using PostgreSQL for scalability.
-
Perform data analysis and visualization to understand rating trends.
├── sources/
│ ├── queries/ # SQL queries for fetching data
│ ├── models/ # Machine learning models
│ ├── data/ # Raw and processed data
├── notebooks/ # Jupyter notebooks for EDA & model training
├── images/ # Visual assets (plots, schema diagrams, etc.)
├── scripts/ # Automation scripts
├── README.md # Documentation
Ensure you have the following installed:
pip install pandas scikit-learn psycopg2 faker matplotlib seaborn
- DBeaver (optional, for database visualization: Download)
- In this case, we are going to use DBeaver, so the first thing we need to do is create a new connection.
- As shown in the picture, we need to configure the following settings:
- Host: localhost
- Database: postgres
- Port: 5432
- Username: postgres
- Password: YOUR PASSWORD
- After setting the parameters, we should test the connection to ensure it works before applying the configuration.
- movie_id (INT, PRIMARY KEY, AUTO_INCREMENT) - Unique identifier for each movie.
- title (VARCHAR(255)) - Title of the movie.
- release_year (INT) - Year the movie was released.
- genre (VARCHAR(100)) - Genre of the movie.
- description (TEXT) - Brief description of the movie.
- director (VARCHAR(100)) - Director of the movie.
- duration (INT) - Duration in minutes.
- user_id (INT, PRIMARY KEY, AUTO_INCREMENT) - Unique identifier for each user.
- username (VARCHAR(50)) - Username.
- email (VARCHAR(100)) - User's email address.
- password (VARCHAR(255)) - Encrypted password.
- join_date (DATETIME) - Date the user registered.
- rating_id (INT, PRIMARY KEY, AUTO_INCREMENT) - Unique identifier for each rating.
- user_id (INT, FOREIGN KEY) - Reference to the user who rated the movie.
- movie_id (INT, FOREIGN KEY) - Reference to the rated movie.
- rating (DECIMAL(3, 2)) - Rating given to the movie (e.g., from 1.0 to 5.0).
- rating_date (DATETIME) - Date the rating was made.
movies
andratings
are related throughmovie_id
, allowing storage of ratings given to each movie.users
andratings
are related throughuser_id
, tracking which user gave a rating.
- Collaborative Filtering: Uses ratings data to identify user preferences.
- User-Based vs Item-Based Filtering:
- User-based: Finds similar users based on ratings.
- Item-based: Suggests similar movies based on ratings.
- Hybrid Approach: Combines genre, director, and rating info.
SELECT m.title, AVG(r.rating) AS avg_rating
FROM ratings r
JOIN movies m ON r.movie_id = m.movie_id
WHERE r.user_id IN (
SELECT DISTINCT r1.user_id
FROM ratings r1
WHERE r1.movie_id = <USER_LAST_WATCHED_MOVIE>
)
GROUP BY m.title
ORDER BY avg_rating DESC
LIMIT 10;
-
RMSE (Root Mean Squared Error) - Measures prediction accuracy.
-
Precision@K, Recall@K - Evaluate recommendation relevance.
-
Confusion Matrix - Analyzes classification errors.
-
User passwords are stored securely using hashing (e.g., bcrypt).
-
Database access is restricted to authorized users.
-
Sensitive user information is never exposed in raw format.
-
Implement deep learning for better recommendations.
-
Improve handling of cold-start problems for new users/movies.
-
Integrate real-time recommendations using streaming data.
- Postgres Documentation
- Faker Documentation
- Random Documentation
- Psycopg Documentation
- Pandas Documentation
- ReportLab Documentation
Contributions are welcome! 🎉
Feel free to submit issues or pull requests to enhance the functionality.
- Fork the repository.
- Create a feature branch (git checkout -b feature-branch).
- Commit your changes (git commit -m "Added new feature").
- Push to the branch (git push origin feature-branch).
- Open a Pull Request.
- For questions or suggestions, open an issue or contact me via email. 🚀