Funder scrapers

A collection of scrapers for gathering data from grant funders, intended to be used in the Beehive funding platform.

Written using python3 and scrapy

Install

Clone into new directory git clone https://github.com/TechforgoodCAST/beehive-scrapers.git
Setup virtual environment python3 venv env
Enter virtual environment source env\bin\activate (linux) or env\Scripts\activate (windows)
Install requirements pip install -r requirements.txt
(Windows only) install pypiwin32: pip install pypiwin32

Write a new spider

Run the command:

scrapy genspider -t fund_spider fundname "fundurl.com/path-to-fund-list"

Where:

fundname is the name of the funder (all lowercase, no spaces or special characters)
"fundurl.com/path-to-fund-list" should be the URL of the fund list page.

This will generate a skeleton scraper with the capability to:

go through a fund list page
generate titles and links for each fund
go to a particular fund page and get more details
go to the next page if the fund list is on more than one page

You'll need to adjust the css selectors depending on the exact structure of the list page.

Run a spider

Comic relief

To output funds found to a funds.jl JSON lines file run: scrapy crawl comicrelief -o funds.jl

Run all spiders

To run all spiders use the following command:

python funderscrapers/crawl_all.py

You can also use crawl_all.bat in Windows or ./crawl_all.sh in Bash.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
funderscrapers		funderscrapers
.gitignore		.gitignore
README.md		README.md
crawl_all.bat		crawl_all.bat
crawl_all.sh		crawl_all.sh
requirements.txt		requirements.txt
scrapinghub.yml		scrapinghub.yml
scrapy.cfg		scrapy.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Funder scrapers

Install

Write a new spider

Run a spider

Comic relief

Run all spiders

About

Releases

Packages

Languages

TechforgoodCAST/beehive-scrapers

Folders and files

Latest commit

History

Repository files navigation

Funder scrapers

Install

Write a new spider

Run a spider

Comic relief

Run all spiders

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages