README

Bachelor

Name: Chi Sam Mac

Student-number: s2588382

How to run the classifier

First you need to have all subtitles files (directory subtitle, with subdirectories the genres: Comedy, Drama, Documentary, Thriller, etc.)

The Doc2vec model: d2v_150.model

The movie_classification.py file.

Then run movie_classification.py* as: python3 movie_classification.py Comedy Drama Documentary Horror

*POS tag feature commented to increase efficiency of program, as the feature does not add anything to performance of the classifier. To turn it on uncomment the following lines: 35, 392-412, 427, 534-537, 549.

Files neccesary to gather all data from scratch

Directory "subtitles" with in it directories of the genres ("Action", "Comedy", "Documentary", "Drama", "Horror", "Thriller")

movies.csv (movieLens data roughly 9k different movies)

movies2.csv (movieLens data roughly 27k different movies)

transform_data.py

get_subs.py

scrapper.py

doc2vec.py

grid_search.py (for parameters algorithms)

gs_feature_weights.py (for weights of features)

base_program.py (classifier with basic BOW model)

movie_classification.py

Steps to gather all data from scratch and run the classifier

Step1: python3 transform_data.py #To get the data of which movies to get subtitles from

Step2: python3 get_subs.py #This downloads all the subtitles for the movies chosen before

Step3: python3 doc2vec.py #Create a doc2vec model

Step4: python3 movie_classification.py (Don't need grid_search.py/gs_feature_weights.py, already used for parameters/weights in classifier in movie_classification.py)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

README

Bachelor

How to run the classifier

Files neccesary to gather all data from scratch

Steps to gather all data from scratch and run the classifier

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
__pycache__		__pycache__
subtitles		subtitles
README.md		README.md
base_program.py		base_program.py
d2v_150.model		d2v_150.model
d2v_400.model		d2v_400.model
doc2vec.py		doc2vec.py
final_movies.csv		final_movies.csv
get_subs.py		get_subs.py
grid_search.py		grid_search.py
gs_feature_weights.py		gs_feature_weights.py
movie_classification.py		movie_classification.py
movies.csv		movies.csv
movies2.csv		movies2.csv
scrapper.py		scrapper.py
transform_data.py		transform_data.py

cs-mac/movie_genre_classification

Folders and files

Latest commit

History

Repository files navigation

README

Bachelor

How to run the classifier

Files neccesary to gather all data from scratch

Steps to gather all data from scratch and run the classifier

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages