Skip to content

IvoDSBarros/web-scraping-use-cases

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 

Repository files navigation

Web scraping use cases

Overview

The main subject of this repository is web scraping. In a nutshell, four web scrapers were developed with Python as use cases to portray different web data extraction scenarios:

  1. HTML tag-based extraction
  2. JSON var-based extraction
  3. JSON API-response-based extraction

1. HTML tag-based extraction

Website: unesco.org/en

To extract the list of world heritage sites designated by UNESCO.


py script: web_scraping_unesco_world_heritage_sites.py
csv output: unesco_world_heritage_sites.csv

Website: gfmag.com

To extract multiple tables on the world's best cities to live compiled by the Global Finance magazine.


py script: web_scraping_best_cities_to_live.py
csv output: best_cities_to_live.csv

2. JSON var-based extraction

To extract the Rolling Stone list on the greatest albums of all time.


py script: web_scraping_rs_500_greatest_albums.py
csv output: rs_album_list.csv

3. JSON API-response-based extraction

To extract data of all hardcover fiction/nonfiction books for all the best sellers lists of the New York Times.


py script: web_scraping_nyt_api.py
csv output: nyt_bestsellers_books.csv

Releases

No releases published

Packages

No packages published

Languages