web Scrapper In Python
-
Updated
Sep 6, 2023 - Python
web Scrapper In Python
Summarize articles using NLTK, Gemini and Trafilatura
This project is a Python-based web scraping tool that uses the Trafilatura library to extract and save text content from a list of specified websites. The program is designed to process multiple URLs, extract their main content, and save each website's content to a separate .txt file.
A web scraper with an LLM-powered document suggestion system that combines web crawling, data extraction, and advanced AI capabilities to recommend relevant documents.
Add a description, image, and links to the trafilatura topic page so that developers can more easily learn about it.
To associate your repository with the trafilatura topic, visit your repo's landing page and select "manage topics."