Skip to content

Scraper

Franz Sauerwald edited this page Dec 8, 2021 · 3 revisions

The scraper parses data from a hash with name and url information. This hash will be pushed by the HPI Website server in a cronjob.

How to run

The scraper can be started by running the provided rake task: rake compass_scraper:scrape

To see any effect, your local person_urls table has to be filled with name and url pairs. (ask sb. of team Datenkrake for the list)

How it works

Based on the selectors present on a loaded url, the scraper uses a different technique to parse information, e.g. the ParagraphScraper or the TableScraper

Clone this wiki locally