-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathREADME.txt
42 lines (39 loc) · 2.8 KB
/
README.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
A/ Data
Link data: https://husteduvn-my.sharepoint.com/:f:/g/personal/hoang_nv194434_sis_hust_edu_vn/EtbisHI7TrxOqDujhWaNLMYB0vaUIyKaEC-GK8tlXvyHVg?e=Lu4xM1
The data folder in the link is structured as follows:
——AnimeEDA
————data
——————csv : contains the final data, all in csv format.
——————merged: contains the merged data.
——————preprocessed: contains the individual preprocessed data of each database.
——————raw: contains the individual scraped data of each database.
——————reference: by-product of the merging phase.
——————split: by-product of the merging phase.
——————temporary: by-product of the merging phase.
——————keywords.json: by-product of the merging phase.
B/ Code
Scrape folder
This folder contains the code we used to scrape data from websites.
* anime: This folder is a scrapy project, already structured based on scrapy project standard. Inside this folder are the code necessary to scrape data from AniList and AniSearch using scrapy.
* scrape_mal_anime.py: Code to get data from My Anime List using their provided API.
* kitsusearch.py: Code to get data from Kitsu using their provided API.
Preprocess_And_Integrate folder
This folder contains the code to merge data from all sources and preprocess data
* merge_mal_data.py: merge multiples file of MyAnimelist in raw folder
* preprocess_mal.py: preprocess all attributes of MyAnimeList
* preprocess_anilist.py: preprocess all attributes of Anilist
* preprocess_kitsu.py: preprocess all attributes of Kitsu
* preprocess_anisearch.py: preprocess all attributes of AniSearch
* data_merge.py: merge preprocessed data of all sources
* singlevalued_multivalued_split.py: split the single-valued and multi-valued attributes in the merged data into separate files for later preprocessing.
* data_normalize.py: convert json file of multi-valued attributes to dummy coding tables, then save that data as csv files.
* genres_preprocess.ipynb: preprocess the genre attribute after merge and normalization.
* Preprocess_studios.ipynb: preprocess the studio attribute after merge and normalization.
Analysis folder
This folder contains the code to visualize data and the plots generated by these code.
* Plots: This subfolder contains all the png images of our data visualizations.
* Barcharts_Heatmaps.ipynb: barcharts, correlation heatmaps, successful score heatmaps for genre, studio, voice actor attributes.
* Distribution_and_Correlation.ipynb: distribution and measures of central tendency plot for mean score and correlation heatmap.
* boxplot_score_by_genre_media.ipynb: box plot mean score of anime by genres and media type.
* viz_scores_wrt_time.py: Scatterplots of score and popularity with respect to time.
* viz_score_years.py: Box plots of scores for each year.