I collected over 3000 movie data from an API provided by TMDB
id | title | genres | overview | adult | release_year | poster_url | keywords | cast | director | popularity | |
---|---|---|---|---|---|---|---|---|---|---|---|
95 | 157336 | Interstellar | [Adventure, Drama, Science Fiction] | The adventures of a group of explorers who make use of a newly discovered wormhole... | False | 2014 | https://image.tmdb.org/t/p/w500/gEU2QniE6E77NI6lCU6MxlNBvIx.jpg | [artificial intelligence, nasa, time warp, spacecraft, expedition, future,... | [[Matthew McConaughey, /sY2mwpafcwqyYS1sOySu1MENDse.jpg], [Timothée Chalamet, /BE2sdjpgsa2rNTFa66f7upkaOP.jpg]... | [[Christopher Nolan, /xuAIuYSmsUzKlUMBFGVZaWsY3DZ.jpg]] | 128.429 |
After a bit more cleaning of the collected data, I created a column named 'tags' which contains string versions of the genres, overview, keywords, cast & director all seperated by space(' ').
Then I used the Stemming technique, and after I applied the CountVectorization technique with 6000 features, thus I created the features vector.
There are many ways we can recommend based on content, I used cosine similarity to recommend.
Some other ways include K-Nearest-Neighbour and ANNOY(Approximate Nearest Neighbors) from Spotify.
file name | description | link |
---|---|---|
similarity_df.parquet | calculated similarity data frame | https://github.com/rohit-krish/Movie-Recommendation/raw/main/app/website/static/similarity_df.parquet |
combined.parquet | collected data | https://github.com/rohit-krish/Movie-Recommendation/raw/main/data/combined.parquet |
git clone git@github.com:rohit-krish/Movie-Recommendation.git
cd Movie-Recommendation
pip install -r ./requirements.txt
Before anything you should create an account in TMDB and paste you API KEY into a .env file
echo "API_KEY=<your_api_key>" > .env
cd app
python main.py
- Install Nginx
sudo apt install nginx
- create a configuration for the nginx web server
this configuration will allow nginx to set a reverse proxy for our Flask application
the reason we are using the reverse proxy is so that the Gunicorn web server that we are using is synchronous and it is vulnerable to Dos or DDos attacks, since nginx is asynchronous we can use nginx as a reverse proxy as a layer of defense in front of the flask web server.
I'm not saying that by just using nginx and the below configuration, your app is safe, actually, the app has vulnerabilities(I'm using document.write js function with the response from the server, if any hacker performs Man-in-the-middle-attacks then the server is down pretty much), I'm not caring about it because it is just a hobby project of mine, but if anyone wants to use it then you should know about this, that's why I'm noting it here.
sudo echo "server {
listen 80;
location / {
proxy_pass http://127.0.0.1:8000;
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
}" > /etc/nginx/sites-enabled/flask_app
sudo nginx -t # check whether the syntax is correct or not
sudo nginx -s reload
Now the Nginx part is finished, now we have to create and run the flask app using gunicorn in http://127.0.0.1:8000; or http://0.0.0.0:8000;
git clone git@github.com:rohit-krish/Movie-Recommendation.git
cd Movie-Recommendation
pip install -r ./requirements.txt
Before anything you should create an account in TMDB and paste you API KEY into a .env file
echo "API_KEY=<your_api_key>" > .env
cd app
flask --app main:app run # test whether the flask is working fine or not
- create a Gunicorn config file
echo "bind = '0.0.0.0:8000'
workers = 3 # Adjust the number of workers as needed
daemon = True # to run the app in the background
" > gunicorn_config.py # This file should be in the `app` directory
gunicorn -c gunicorn_config.py main:app
sudo pkill -f gunicorn
sudo pkill -f gunicorn3
# or
sudo killall gunicorn gunicorn3
- Increase the initial movie options
- Implement the search feature
- Solve the search box width problem in mobile phone size
- pictures in mobile phone size are too large, fix it.
- Add placeholders before loading the complete UI
- When just clicking the search bar show all movie lists.