minimal_web_scraper.parsers#

Subpackage of minimal_web_scraper. Provide the API to build parsers and manage them.

Overview#

Classes#
`BaseParser`	Base class for creating custom parsers.

Function#
`add_parser`(parser)	Add the given parser in the list that the scraper checks for parsing data.

Exceptions#
`ParserNotFound`	Exception raised when the scraper does not find a parser associated to a URL.

class minimal_web_scraper.parsers.BaseParser#

Base class for creating custom parsers.

Subclasses must override parse() and scope_urls. Parsers are not intended to be instantiated.

Overview

Attributes#
`scope_urls`	Define which URLs the parser is intended to parse.

Methods#
`parse`(html_content, encoding)	class Abstract method to parse HTML chunks.

Members

abstract classmethod parse(html_content: bytes, encoding: str | None) → Any#

Abstract method to parse HTML chunks.

Parameters:

Returns:

return the extracted elements

minimal_web_scraper.parsers.add_parser(parser: Type[BaseParser] | None = None) → None#

Add the given parser in the list that the scraper checks for parsing data.

Parameters:: parser – the parser must be a subclass of BaseParser. If no parser is provided, it will add all parsers imported (default None).
Raises:: TypeError – raised when the argument parser is not a subclass of BaseParser

exception minimal_web_scraper.parsers.ParserNotFound#

Bases: BaseParserException

Exception raised when the scraper does not find a parser associated to a URL.