This unit covers how to post data to web servers, so that our spiders can perform searches and authenticate themselves in websites that require that.
- Handling HTML forms
- Authenticating your spider via login forms
- Dealing with validation tokens
Check out the slides for this unit
- A simple spider to demonstrate how
FormRequest
works:spider_1_basic_form.py
- A spider that authenticates into quotes.toscrape.com:
spider_2_login.py
- Same as #2, but using
FormRequest.from_response()
method:spider_3_login.py
Build a Spider that authenticates into news.ycombinator.com and then extracts your own username and amount of points from the news page top (fake user/pass: scrape1123
/scrape1123
).
Check out the spider once you're done.
Build a spider that scrapes all the quotes from every author listed in quotes.toscrape.com/search.aspx.