This repo has Dark Web scrapy spiders. These were actually used to get data.
In the dark web, CAPTCHAs pose a problem for spiders. This was taken care of by solving CAPTCHAs manually and then feeding cookies to the spider.
To use these files:
- Start a new scrapy project.
- Overwrite the existing settings by referring to settings.py
- First run the title scraper. For this, verify that the selectors work for your website or write your own selectors. Replace 'sample.website' and put proper cookies.
- Now using data scraped, do the same for post scaper.