Note

Python3.9 or above is required to make use of this project.

Clone this repository¹ or download the source code and proceed as given below.

Note

Before executing the shown commands, make sure to be in the project's root directory i.e. ~\YT-Comments-Clustering\. Operate your command line or terminal from this path.

Prerequisites

Before we move onto clustering YouTube comments, we would first need to fetch its comments.

Google thankfully provides us with the YouTube Data API, which could be used for a multitude of purposes. However our need for this project is just limited to fetching comments from a specific YouTube video, along with their respective like counts.

Setting up the API key

Follow the steps as described here to set up a new API key.
Navigate to the following path within the project, and create a file API_KEY.py.
```
~\YT-Comments-Clustering\src\comments-extractor\API_KEY.py
```
Now, in this file, create a variable named API_KEY and assign the API key obtained from step 1 to this variable as a string.
```
API_KEY = "your_api_key_here"
```

And you're done with the API key setup!

Installing requirements

It's preferable to use this project in a separate virtual environment of its own, by virtue of the nature of the dependendence of certain modules on some certain specific versions of other modules that are used in this project.

Creating a virtual environment

Note

Use pip3 and python3 if on a Mac or Linux machine, otherwise the usual python on Windows.

Navigate to the project root ~\YT-Comments-Clustering and execute:
```
python -m venv project-venv
```

To activate this venv:

source project-venv/bin/activate

Mac/Linux

or,

project-venv\Scripts\activate

Windows

When done with using the project, deactivate the venv:
```
deactivate
```

Important

Now all our requirements will be installed within this venv, namely project-venv. Make sure to activate this venv whenever you wish to make use of this project.

Installing dependencies

Run the following commands

pip install -r requirements.txt

python
>>> import nltk
>>> nltk.download('stopwords')

python -m spacy download en_core_web_sm

And you are done with the requirements' setup! The project's now ready to use!

Usage

Let us now cluster the comments made on this video, going through each of the steps involved in the project's usage.

Comments Extraction

The video ID will be located in the URL of the video page, right after the v= URL parameter

For this video, it is IUTGFQpKaPU.

Run

python src/comments-extractor/yt-comments-extractor.py -h

yt-comments-extractor.py is indeed a command line tool.

usage: yt-comments-extractor.py [-h] VIDEO_ID JSON_PATH

Command Line Utility to extract and save YouTube comments of a specified video to a json file

positional arguments:
  VIDEO_ID    a YouTube video's ID
  JSON_PATH   relative path of json file to save the data

options:
  -h, --help  show this help message and exit

To fetch the comments made on the shown video, run:

python src/comments-extractor/yt-comments-extractor.py IUTGFQpKaPU raw-data/comments_1.json

It should take a while until the program finishes exectution with the message

Comments Data saved successfully.

Now navigate to the earlier specified JSON file path, and you should be able to see the JSON file containing all the comments made on the video, along with their respective like counts.

Note

The fetched comments don't include the replies made on the comments of the video, and thus as a consequence the total number of comments fetched maybe less than the actual comments made on a video.

Comments Clustering

Now we shall move on with clustering the comments that have been fetched.

Run

python src/main.py -h

main.py, alike yt-comments-extractor.py, is also a command line tool.

usage: main.py [-h] JSON_PATH CSV_PATH

Command Line Utility to cluster (pre-saved) YouTube comments and visualise the results

positional arguments:
  JSON_PATH   relative path of json file to be processed
  CSV_PATH    relative path of csv file to save the processed data

options:
  -h, --help  show this help message and exit

Run

python src/main.py raw-data/comments_1.json data/comments_1.csv

You'll be sequentially displayed the following lines of output

Reading data... Data read successfully

Cleaning data... Data cleaned successfully (28.06)s

Extracting features from cleaned data... Feature extraction performed successfully

Clustering datapoints... Datapoints clustered successfully

Writing data to data/comments_1.csv... Data written to csv successfully

Task complete (34.54s)


Press Enter to view the visual results

Upon pressing enter, as many word-clouds shall be displayed (one-by-one) on your screen as there would be the number of clusters generated. You can save these word-clouds in a folder.

TODO: Auto-save the generated word-clouds and bar graph plot on the go.

Each word-cloud would represent the frequency and relevance of the words that were the most used in the comments belonging to that cluster by means of their size.

After all the word-clouds, a bar graph plot shall also be displayed, comparing the number of comments and the total likes received to each of the clusters.

And Voila! You have now successfully clustered and analysed the comments on a video for the very first time by means of this project.

Footnotes

How to Clone a repository? ↩

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

USAGE.md

USAGE.md

Prerequisites

Setting up the API key

Installing requirements

Creating a virtual environment

Installing dependencies

Usage

Comments Extraction

Comments Clustering

Files

USAGE.md

Latest commit

History

USAGE.md

File metadata and controls

Prerequisites

Setting up the API key

Installing requirements

Creating a virtual environment

Installing dependencies

Usage

Comments Extraction

Comments Clustering

Footnotes