This project is a full stack application that aims to track and perform analysis on Social Networking Services (SNS) based on a related keyword. The tool will perform scraping on SNS and the results will be displayed on a dashboard where user can gain insights from. The current build supports scraping for Twitter, Reddit and Pastebin.
pip install -r requirements.txt
- At the main directory of the repository
python app.py
- Open a web browser to this address:
-
Run app.py
-
Open a web browser to this address:
NOTE: To reuse past results based on feather files, refer to Part 2.
-
Enter a keyword in the search bar (eg: Ransomware)
-
Fill in the advanced options as follows: Choose platforms to scrape from: Twitter, Reddit, Pastebin. (Default setting will scrape from all 3 platforms)
-
Subreddit (optional) If reddit platform is selected, user can specify multiple subreddit to perform scraping on. Commas are used as delimiters. Eg: hacking,sysadmin
-
Time range selection Choose a time range which data will be scraped from.(Default setting is 7 days)
-
Depth Choose the maximum amount of data to scrape based on the given preset values quick, standard, and deep. Respectively, the iterations are 10,000, 50,000 and 100,000. User can also select custom option for custom input. (The default value if the user did not specify any would be 10,000)
-
Refinement
User can choose to input additonal keyword for refinement. This feature in essence works the same way the Google exact-match search operator (“”) does, which basically tells the tool to only return search results if it only has the keyword provided.
-
The user will then click search
This section is for reusing past scraped results. User will upload feather files to view data on dashboard without performing scraping operations again.
-
Upload past results
User can upload feather file(s)
-
Disable scraper
After upload the file(s), the user must check the disable scraper option.
-
The user will then click search
(NOTE: This feature requires a Twitter Developer Account. Just like previous usage, for uploading an existing JSON file, please refer to Part 2.)
User can choose to view a relationship model for twitter users based on data scraped from twitter.
-
Select user
Step 1: Navigate to the Twitter Relationship Visualizer page, it can be accessed from the scraping results page. A list of users from the results will be listed on the webpage for easy access and browsing. Select one user from the result list and input it into the search field. The user this example would be using would be “ka0com”.
-
Upload credentials
Step 2: The end user must have a Twitter Developer Account and registered an application to obtain a bearer token which is needed to authenticate with the official Twitter API. After obtaining the token, it needs to be in a “.ini” file in the format as seen in image above. The user would then be able to upload the credential file onto the website to authenticate with the Twitter API.
-
Advanced Options
Step 3: The user needs to specify the level/depth desired. The depth that the example will be using would be “3”, it is also the default if none is specified. If a high level is chosen, the tool will take a long time to gather all the data required. The user is now able to click on the search button to execute the scraping process.
This section is for uploading of past results.