Dependencies

Python 3.8+
Requests
BeautifulSoup

Install

01 -) Clone:

$ git clone https://github.com/thiiagoms/links-extractor

02 -) Go to links-extractor directory:

$ cd links-extractor
links-extractor $

Run

01 -) In your script.py call Extractor main class like:

from src.services.extractor import Extractor
from src.utils.printer import Printer

urls = ['https://github.com', 'https://google.com']
extractor = Extractor()
links = extractor.extract(urls, timeout=10)

for url, extracted_links in links.items():
    Printer.message(f"Url: {url}")
    for link in extracted_links:
        Printer.success(f" { link}")
    Printer.message("###############")

And you should receive this output:

$ python example.py

Url: https://github.com

  #start-of-content
  https://github.com/
  /signup?ref_cta=Sign+up&ref_loc=header+logged+out&ref_page=%2F&source=header-home
  /features/actions
  /features/packages
  /features/security

###############

Url: https://google.com

  https://www.google.com/imghp?hl=pt-BR&tab=wi
  https://maps.google.com.br/maps?hl=pt-BR&tab=wl
  https://play.google.com/?hl=pt-BR&tab=w8

###############

Bonus

01 -) Run tests with pytest:

links-extractor $ pytest

02 -) Run autopep8 lint on files like:

links-extractor $  autopep8 --in-place --aggressive --aggressive src/services/extractor.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Extract links from urls 🗜️

Dependencies

Install

Run

Bonus

Files

README.md

Latest commit

History

README.md

File metadata and controls

Extract links from urls 🗜️

Dependencies

Install

Run

Bonus