Books Web Scraping and Analysis

This repository contains a Python-based project that demonstrates web scraping and data analysis. The project involves extracting book-related data from the Books to Scrape website, followed by exploratory data analysis (EDA) and visualizations to gain insights from the collected data.

Project Structure

The repository includes the following Jupyter Notebooks:

books-website-scraping.ipynb
- Extracts book data such as titles, ratings, prices, and availability from the website.
- Saves the scraped data into a CSV file for further analysis.
books-data-analysis.ipynb
- Loads the scraped data from the CSV file.
- Cleans and preprocesses the dataset (e.g., converting ratings to numerical values).
- Performs EDA and visualizations to analyze pricing, ratings, and other trends.

Features

Web Scraping:
- Extract book details including:
  - Book ID (UPC)
  - Title
  - Category
  - Rating
  - Price
  - Stock availability (Stock status)
  - Quantity available
Exploratory Data Analysis (EDA):
- Visualizes key metrics such as price distributions and rating trends.
- Identifies relationships between features like price and rating.

Tools and Libraries

Web Scraping:
- requests
- BeautifulSoup
Data Manipulation:
- pandas
Data Visualization:
- matplotlib
- seaborn
- squarify

About the Dataset

The scraped dataset is available on Kaggle: Books Data on Kaggle

Dataset Columns:

ID: Unique Product Code (UPC) for each book.
Title: The title of the book.
Category: Genre or category of the book.
Price [£]: Price in GBP (£).
Rating: Star rating (One to Five) based on customer reviews.
Availability: Whether the book is in stock.
Quantity: The number of available copies.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Books Web Scraping and Analysis

Project Structure

Features

Tools and Libraries

About the Dataset

Dataset Columns:

Files

README.md

Latest commit

History

README.md

File metadata and controls

Books Web Scraping and Analysis

Project Structure

Features

Tools and Libraries

About the Dataset

Dataset Columns: