Web scraping use cases

Overview

The main subject of this repository is web scraping. In a nutshell, four web scrapers were developed with Python as use cases to portray different web data extraction scenarios:

HTML tag-based extraction
JSON var-based extraction
JSON API-response-based extraction

1. HTML tag-based extraction

Website: unesco.org/en

To extract the list of world heritage sites designated by UNESCO.

py script: web_scraping_unesco_world_heritage_sites.py
csv output: unesco_world_heritage_sites.csv

Website: gfmag.com

To extract multiple tables on the world's best cities to live compiled by the Global Finance magazine.

py script: web_scraping_best_cities_to_live.py
csv output: best_cities_to_live.csv

(back to top)

2. JSON var-based extraction

Website: www.rollingstone.com

To extract the Rolling Stone list on the greatest albums of all time.

py script: web_scraping_rs_500_greatest_albums.py
csv output: rs_album_list.csv

(back to top)

3. JSON API-response-based extraction

Website: developer.nytimes.com

To extract data of all hardcover fiction/nonfiction books for all the best sellers lists of the New York Times.

py script: web_scraping_nyt_api.py
csv output: nyt_bestsellers_books.csv

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 164 Commits
output		output
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web scraping use cases

Overview

1. HTML tag-based extraction

Website: unesco.org/en

Website: gfmag.com

2. JSON var-based extraction

Website: www.rollingstone.com

3. JSON API-response-based extraction

Website: developer.nytimes.com

About

Releases

Packages

Languages

IvoDSBarros/web-scraping-use-cases

Folders and files

Latest commit

History

Repository files navigation

Web scraping use cases

Overview

1. HTML tag-based extraction

Website: unesco.org/en

Website: gfmag.com

2. JSON var-based extraction

Website: www.rollingstone.com

3. JSON API-response-based extraction

Website: developer.nytimes.com

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages