This repository contains scripts to scrape product information and images from Amazon using SeleniumBase and BeautifulSoup. The project is designed to help you gather data about products, such as titles, ratings, reviews, and prices, and download the corresponding product images.
Table of Contents
- Product Information Scraper: Extracts product links, titles, ratings, review counts, original prices, and discounted prices.
- Product Image Downloader: Downloads the product images from the provided links and saves them locally.
- Utility Functions: Includes helper functions to handle directory creation, saving images, and writing data to CSV files.
- Python 3.9+
- SeleniumBase
- BeautifulSoup
- pandas
- requests
-
Clone this repository:
git clone https://github.com/yourusername/amazon-scraper.git cd amazon-scraper
-
Install the required Python packages:
pip install seleniumbase beautifulsoup4 pandas requests
This script scrapes product information based on a search query and saves the data to a CSV file.
- Script:
product_info_scraper.py
- Output: A CSV file containing product links, titles, ratings, review counts, and prices.
python product_info_scraper.py
This command will create a CSV file (e.g., python_book_links.csv
) with product details from Amazon based on the search query.
This script downloads product images using the links generated by the product_info_scraper.py
.
- Script:
product_img_downloader.py
- Input: A CSV file containing product links.
- Output: Images saved in the
images
directory.
python product_img_downloader.py
This command will download the product images and save them to the images
directory.
product_info_scraper.py
: Scrapes product information from Amazon.product_img_downloader.py
: Downloads product images based on the provided CSV file.utils.py
: Contains utility functions for saving images, creating directories, and writing to CSV files.src/
: Contains the CSV files generated by the scraper scripts.
This project is licensed under the MIT License.
Contributions are welcome! Please submit a pull request or open an issue to discuss your ideas.
This scraper is for educational purposes only. Scraping Amazon's website might violate their terms of service. Use it at your own risk.