AI Web Scraper is a project that combines web scraping with advanced content analysis. It leverages Selenium for scraping and Groq for analyzing extracted content. This tool can scrape and analyze content from any given website, providing structured insights in JSON format.
- Web Scraping: Extracts title and content from a webpage.
- Content Analysis: Provides structured insights like themes, summaries, and actionable recommendations.
- Headless Browsing: Optimized for efficiency using headless Chrome.
-
Clone this repository:
git clone https://github.com/ArielFalcon/ai_web_scrapper.git
-
Navigate to the project directory:
cd ai_web_scrapper
-
Install dependencies:
npm install
-
Create a
.env
file in the root directory and add your API key for Groq:GROQ_API_KEY=your_api_key_here
Run the scraper with a target URL:
node app.js <target_url>
For example:
node app.js https://example.com
The results are saved as a JSON file named scraped-and-analyzed-content.json
in the root directory.
- Node.js (v16 or later)
- Chrome browser
- The
.env
file is not included in this repository for security reasons. You must create it manually and add your API key.
This project is licensed under the MIT License.