A Python library that lets you easily scrape data from popular websites using basic product information
For many, collecting product data can be helpful for monitoring price changes or helping decide which e-commerce site to purchase from. However, creating a web scraper from scratch can be cumbersome and time consuming. My goal is to make it easier for people to collect product data, and this Python library aims to simplify the web scraping process. With basic inputs like product information and store url, you can have easy access to rich product information.
To install, run the following:
pip install web-scraper-python-library
The following code will retrieve and print the product data for an iphone 12
from Amazon
as a JSON object.
product
: a product name, like you would put into the product search page of a company's website
company
: 'eBay', 'Walmart', or 'Amazon'
from web_scraper import main as m
json_product_data = m.scrape("iphone 12", "Amazon")
# write json to file
with open("amazon_product_data.json", "w") as file:
file.write(json_product_data)
[
{
"company": "Amazon",
"asin": "B09HWS3VGM",
"name": "TCL 10 5G UW 128GB Diamond Gray Smartphone (Verizon) (Renewed)",
"price": 84.0,
"extraction_date": "2023-04-27 16:59:56",
"rating": 3.8,
"num_ratings": 106.0,
"image_url": "https://m.media-amazon.com/images/I/41e-4yZQl9L._AC_UY218_.jpg",
"url": "https://www.amazon.com/TCL-Diamond-Smartphone-Verizon-Renewed/dp/B09HWS3VGM/ref=sr_1_42?keywords=iphone+12&qid=1682629195&sr=8-42"
},
...
{
"company": "Amazon",
"asin": "B0BS986JRZ",
"name": "QIMHAI Smartphone Unlocked Cell Phones S22 Ultra 6.1in HD Screen Cheap Phones 2GB/16GB Android 10 Straight Talk Phone 5000mAh 128GB Extension Dual Sim Boost Mobile Phones Telefonos (Gold)",
"price": 79.99,
"extraction_date": "2023-04-27 16:59:56",
"rating": 1.9,
"num_ratings": 6.0,
"image_url": "https://m.media-amazon.com/images/I/71fa-n5E69L._AC_UY218_.jpg",
"url": "https://www.amazon.com/QIMHAI-Smartphone-Unlocked-Extension-Telefonos/dp/B0BS986JRZ/ref=sr_1_43?keywords=iphone+12&qid=1682629195&sr=8-43"
}
]