Skip to content

Latest commit

 

History

History
22 lines (17 loc) · 895 Bytes

README.md

File metadata and controls

22 lines (17 loc) · 895 Bytes

BanglapediaCrawler

Scraping Banglapedia Data

In this repository I have built a crawler for extracting all the data from Banglapedia website.

I have extracted the following details:

  1. Title of the article.
  2. Main text body
  3. Image URLs if there is any.
  4. Source URL.
  5. Published date of the article. Also, I have set an ID number which is just for numbering my accessed data.

After extracting the informations I saved it into a csv file, you can also save it in a json file. For saving a file you can write the command on your terminal:

scrapy crawl bangla -o bangladata.csv (for saving as a csv file) or scrapy crawl bangla -o bangladata.json (for saving as a json file)

Requirements:

  1. Pycharm IDE to run the script
  2. Python version 3.10
  3. scrapy version 2.4.1