Skip to content

A collection of movies dataset for your ML project or any other task.

License

Notifications You must be signed in to change notification settings

Simatwa/movies-dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Movies-Dataset

A collection of movies dataset for your ML project or any other task.

You can access the datasets from /data in csv format.

Alternatively you can generate your own using the procedures given below.

Installation

  1. Clone

    git clone https://github.com/Simatwa/movies-dataset.git
    cd movies-dataset
  2. Install requirements

    pip install -r requirements.txt

Usage

Running python -m data_hunter will retrieve data from the two categories available (Hollywood, Bollywood) across all genres available i.e (Action, Adventure, Animation, Biography, Comedy, Crime, Documentary, Drama, Family, Fantasy, Film-Noir, History, Horror, Music, Musical, Mystery, Romance, Sci-Fi, Sport, Thriller, War, Western)

You can pass other options such as limit, dir etc so as to meet your specific needs.

For more usage info you can run $ python -m data_hunter --help
usage: data-hunter [-h]
                   [-g [[Action|Adventure|Animation|Biography|Comedy|Crime|Documentary|Drama|Family|Fantasy|Film-Noir|History|Horror|Music|Musical|Mystery|Romance|Sci-Fi|Sport|Thriller|War|Western] ...]]
                   [-c [[Bollywood|Bollywood|_] ...]] [-l LIMIT]
                   [-d DIR] [-p PREFIX] [-q] [-w] [-t] [-v]

A collection of movies dataset for your ML project or any other
task.

options:
  -h, --help            show this help message and exit
  -g, --genres [[Action|Adventure|Animation|Biography|Comedy|Crime|Documentary|Drama|Family|Fantasy|Film-Noir|History|Horror|Music|Musical|Mystery|Romance|Sci-Fi|Sport|Thriller|War|Western] ...]
                        Movie genres - ['_']
  -c, --categories [[Bollywood|Bollywood|_] ...]
                        Movie category - ['_']
  -l, --limit LIMIT     Total movies per genre(multiple of 20) -
                        1000000
  -d, --dir DIR         Parent directory to save the datasets to -
                        /home/smartwa/git/smartwa/movies-dataset
  -p, --prefix PREFIX   Datasets filename prefix -
  -q, --quiet           Do not stdout any informative texts - False
  -w, --overwrite       Clear all $prefix*.csv file in the $dir -
                        False
  -t, --trace           Maintain trace of the hunting progress -
                        False
  -v, --version         show program's version number and exit

There's no gurantee that the data generated are correct.

Direct Links

No. Genre Link
1 Action https://raw.githubusercontent.com/Simatwa/movies-dataset/main/data/action.csv
2 Adventure https://raw.githubusercontent.com/Simatwa/movies-dataset/main/data/adventure.csv
3 Animation https://raw.githubusercontent.com/Simatwa/movies-dataset/main/data/animation.csv
4 Biography https://raw.githubusercontent.com/Simatwa/movies-dataset/main/data/biography.csv
5 Comedy https://raw.githubusercontent.com/Simatwa/movies-dataset/main/data/comedy.csv
6 Crime https://raw.githubusercontent.com/Simatwa/movies-dataset/main/data/crime.csv
7 Documentary https://raw.githubusercontent.com/Simatwa/movies-dataset/main/data/documentary.csv
8 Drama https://raw.githubusercontent.com/Simatwa/movies-dataset/main/data/drama.csv
9 Family https://raw.githubusercontent.com/Simatwa/movies-dataset/main/data/family.csv
10 Fantasy https://raw.githubusercontent.com/Simatwa/movies-dataset/main/data/fantasy.csv
11 Film-Noir https://raw.githubusercontent.com/Simatwa/movies-dataset/main/data/film-noir.csv
12 History https://raw.githubusercontent.com/Simatwa/movies-dataset/main/data/history.csv
13 Horror https://raw.githubusercontent.com/Simatwa/movies-dataset/main/data/horror.csv
14 Music https://raw.githubusercontent.com/Simatwa/movies-dataset/main/data/music.csv
15 Musical https://raw.githubusercontent.com/Simatwa/movies-dataset/main/data/musical.csv
16 Mystery https://raw.githubusercontent.com/Simatwa/movies-dataset/main/data/mystery.csv
17 Romance https://raw.githubusercontent.com/Simatwa/movies-dataset/main/data/romance.csv
18 Sci-Fi https://raw.githubusercontent.com/Simatwa/movies-dataset/main/data/sci-fi.csv
19 Sport https://raw.githubusercontent.com/Simatwa/movies-dataset/main/data/sport.csv
20 Thriller https://raw.githubusercontent.com/Simatwa/movies-dataset/main/data/thriller.csv
21 War https://raw.githubusercontent.com/Simatwa/movies-dataset/main/data/war.csv
22 Western https://raw.githubusercontent.com/Simatwa/movies-dataset/main/data/western.csv

Combined Datasets Link

No. Identity Link
1 Combined csv https://raw.githubusercontent.com/Simatwa/movies-dataset/main/data/combined.csv
2 Sqlite3 Database https://raw.githubusercontent.com/Simatwa/movies-dataset/main/data/combined.db
2 Sqlite3 Database (Relational) https://raw.githubusercontent.com/Simatwa/movies-dataset/main/data/combined-relational.db

Further details

In order to make work easier, there's cli that comes handy in manipulating the data. This is just but not limited to converting them to various formats and even piling them into one sqlite3 database.