Skip to content

asibhossen897/amazon-scraper

Repository files navigation

Amazon Scraper

This repository contains scripts to scrape product information and images from Amazon using SeleniumBase and BeautifulSoup. The project is designed to help you gather data about products, such as titles, ratings, reviews, and prices, and download the corresponding product images.

Table of Contents

Features

  • Product Information Scraper: Extracts product links, titles, ratings, review counts, original prices, and discounted prices.
  • Product Image Downloader: Downloads the product images from the provided links and saves them locally.
  • Utility Functions: Includes helper functions to handle directory creation, saving images, and writing data to CSV files.

Requirements

  • Python 3.9+
  • SeleniumBase
  • BeautifulSoup
  • pandas
  • requests

Installation

  1. Clone this repository:

    git clone https://github.com/yourusername/amazon-scraper.git
    cd amazon-scraper
  2. Install the required Python packages:

    pip install seleniumbase beautifulsoup4 pandas requests

Usage

1. Scrape Product Information

This script scrapes product information based on a search query and saves the data to a CSV file.

  • Script: product_info_scraper.py
  • Output: A CSV file containing product links, titles, ratings, review counts, and prices.

Example

python product_info_scraper.py

This command will create a CSV file (e.g., python_book_links.csv) with product details from Amazon based on the search query.

2. Download Product Images

This script downloads product images using the links generated by the product_info_scraper.py.

  • Script: product_img_downloader.py
  • Input: A CSV file containing product links.
  • Output: Images saved in the images directory.

Example

python product_img_downloader.py

This command will download the product images and save them to the images directory.

Project Structure

  • product_info_scraper.py: Scrapes product information from Amazon.
  • product_img_downloader.py: Downloads product images based on the provided CSV file.
  • utils.py: Contains utility functions for saving images, creating directories, and writing to CSV files.
  • src/: Contains the CSV files generated by the scraper scripts.

License

This project is licensed under the MIT License.

Author

Asib Hossen

Contributing

Contributions are welcome! Please submit a pull request or open an issue to discuss your ideas.

Disclaimer

This scraper is for educational purposes only. Scraping Amazon's website might violate their terms of service. Use it at your own risk.

About

A scraper to scrape Amazon data (learning purpose).

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages