BanglapediaCrawler

Scraping Banglapedia Data

In this repository I have built a crawler for extracting all the data from Banglapedia website.

I have extracted the following details:

Title of the article.
Main text body
Image URLs if there is any.
Source URL.
Published date of the article. Also, I have set an ID number which is just for numbering my accessed data.

After extracting the informations I saved it into a csv file, you can also save it in a json file. For saving a file you can write the command on your terminal:

scrapy crawl bangla -o bangladata.csv (for saving as a csv file) or scrapy crawl bangla -o bangladata.json (for saving as a json file)

Requirements:

Pycharm IDE to run the script
Python version 3.10
scrapy version 2.4.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

BanglapediaCrawler

Files

README.md

Latest commit

History

README.md

File metadata and controls

BanglapediaCrawler