Skip to content

A collection of web scraping projects using Beautiful Soup, Selenium, and mixed approaches. Each project includes Python scripts and CSV files of the scraped data. Perfect for learning and experimenting with static and dynamic web scraping techniques.

Notifications You must be signed in to change notification settings

chouaib-629/WebScraping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Web Scraping Projects

This repository contains a collection of websites that I have scraped for learning and experimentation purposes. The scraped data is organized into subfolders, where each subfolder corresponds to a specific website. These websites were scraped using different techniques, including Beautiful Soup (bs4) for static content, Selenium for dynamic content, and a mix of both for certain cases.

Folder Structure

  • Main Folder: Contains subfolders, each representing a scraped website.
  • Subfolders: Named based on the website they were scraped from. Each subfolder contains:
    • The Python code used to scrape the website in two formats: .py and .ipynb.
    • The CSV file containing the scraped data.

Scraping Techniques

  1. Static Websites:

    • Scraped using Beautiful Soup (bs4).
    • These websites have static HTML content that can be directly accessed and parsed.
  2. Dynamic Websites:

    • Scraped using Selenium.
    • These websites load data dynamically through JavaScript, requiring a browser simulation to fetch the content.
  3. Mixed Approach:

    • Some websites required a combination of Selenium and bs4.
    • Selenium was used to render the dynamic content, and Beautiful Soup was used for parsing the HTML.

Classification of Websites

Below is the list of websites classified by the scraping technique used:

Beautiful Soup (bs4)

Selenium

Mixed Approach

Requirements

To replicate or run the scraping scripts used in this project, the following Python libraries are required:

  • Beautiful Soup: bs4
  • Selenium
  • Requests
  • lxml
  • html.parser

Ensure you have Python installed, along with the necessary libraries. For Selenium, download the appropriate browser driver (e.g., ChromeDriver for Google Chrome).

Getting Started

  1. Clone this repository to your local machine:

    git clone https://github.com/chouaib-629/WebScraping.git
  2. Navigate to the desired subfolder to inspect the scraped data or associated scripts.

Notes

  • The data scraped from these websites is for educational purposes only. Please adhere to the terms and conditions of the websites before scraping.
  • The scripts and data are provided "as is" without warranty of any kind.

Author

This project is managed by a data science enthusiast and full-stack developer experimenting with web scraping techniques.

Contact Information

For questions or support, please contact Me.

About

A collection of web scraping projects using Beautiful Soup, Selenium, and mixed approaches. Each project includes Python scripts and CSV files of the scraped data. Perfect for learning and experimenting with static and dynamic web scraping techniques.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published