minimal_web_scraper.main#

Overview#

Function#

download(target_url, timeout)

Download the HTML content of the URL, using requests library.

scrape(url)

Orchestrate the download and parse of the resource at the URL.

Attributes#

headers

-

Functions#

minimal_web_scraper.main.download(target_url: str, timeout: int = 1) tuple[bytes, str | None]#

Download the HTML content of the URL, using requests library.

Use a custom header.

Parameters:

target_url (str) – url to download

Raises:

ValueError – if the target_url is not a valid URL

Raise:

may raise exceptions from the requests and urlparse library

Returns:

content of the HTML page and the encoding

minimal_web_scraper.main.scrape(url: str) Any#

Orchestrate the download and parse of the resource at the URL.

Parameters:

url – URL to parse

Returns:

extracted informations by a implemented parsers.BaseParser.parse()

Raise:

parsers.exceptions.ParserNotFound()

Attributes#

minimal_web_scraper.main.headers#