,----------------, ,---------,
,-----------------------, ," ,"|
," ,"| ," ," |
+-----------------------+ | ," ," |
| .-----------------. | | +---------+ |
| | | | | | -======-| |
| | PriceScraperAI | | | | | |
| | Fetching Data | | |/----| ---=| |
| | | | | ,/| ooo === | ;
| | | | | // | [486]| ,"
| `-----------------' |," .;'| | | ,"
+-----------------------+ ;; | | |,"
/_)______________(_/ //' | +---------+
PriceScraperAI is a Python tool designed to scrape pricing information from competitor websites, intelligently extract structured data using AI, and present the results in a clean, tabular format. It is built using modern libraries like Beautiful Soup, Firecrawl, Jina AI, and OpenAI's GPT API for robust data extraction and comparison.
- Scraping: Fetch data from multiple competitor websites.
- AI-Powered Extraction: Use OpenAI's GPT API to intelligently extract structured pricing data (plans, prices, features, etc.).
- Flexible Preprocessing: Automatically clean and preprocess scraped data for better results.
- Easy Comparison: Present data in a readable table format or save it as JSON for further analysis.
- Python
- Beautiful Soup - For static web scraping.
- Firecrawl - For advanced scraping of dynamic content.
- Jina AI - API-based scraping capabilities.
- OpenAI GPT API - For intelligent data extraction.
- PrettyTable - For tabular display of extracted data.
- Clone the repository:
git clone https://github.com/your-username/PriceScraperAI.git cd PriceScraperAI
- Install dependencies:
pip install -r requirements.txt
- Usage
- Add your OpenAI and Mendable API keys.
- Update the competitor_sites list with target websites and URLs.
- Run the script:
python main.py
- Example Output
Extracted Content Table
+---------------+----------------+----------------------------------------------------+ | Site | Provider Name | Extracted Content | +---------------+----------------+----------------------------------------------------+ | LeetCode | Beautiful Soup | {"pricing_tiers": [{"plan": "Free", "price": ...} | | GeeksForGeeks | Jina AI | {"pricing_tiers": [{"plan": "Student", "price...} | +---------------+----------------+----------------------------------------------------+
- Customization
- Modify the competitor_sites variable to target additional websites.
- Adjust preprocess_content for content-specific truncation or cleaning.