Skip to content

Servicio de scrapping AI con Selenium y GUI para evitar antiscrapping protection. Para el servicio AI se utiliza Groq con LLAMA como LLM open source.

License

Notifications You must be signed in to change notification settings

ArielFalcon/ai_web_scrapper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI Web Scraper

AI Web Scraper is a project that combines web scraping with advanced content analysis. It leverages Selenium for scraping and Groq for analyzing extracted content. This tool can scrape and analyze content from any given website, providing structured insights in JSON format.

Features

  • Web Scraping: Extracts title and content from a webpage.
  • Content Analysis: Provides structured insights like themes, summaries, and actionable recommendations.
  • Headless Browsing: Optimized for efficiency using headless Chrome.

Installation

  1. Clone this repository:

    git clone https://github.com/ArielFalcon/ai_web_scrapper.git
  2. Navigate to the project directory:

    cd ai_web_scrapper
  3. Install dependencies:

    npm install
  4. Create a .env file in the root directory and add your API key for Groq:

    GROQ_API_KEY=your_api_key_here

Usage

Run the scraper with a target URL:

node app.js <target_url>

For example:

node app.js https://example.com

Outputs

The results are saved as a JSON file named scraped-and-analyzed-content.json in the root directory.

Prerequisites

  • Node.js (v16 or later)
  • Chrome browser

Notes

  • The .env file is not included in this repository for security reasons. You must create it manually and add your API key.

License

This project is licensed under the MIT License.

About

Servicio de scrapping AI con Selenium y GUI para evitar antiscrapping protection. Para el servicio AI se utiliza Groq con LLAMA como LLM open source.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published