minimal_web_scraper.parsers#
Subpackage of minimal_web_scraper. Provide the API to build parsers and manage them.
Overview#
Base class for creating custom parsers. |
|
Add the given parser in the list that the scraper checks for parsing data. |
Exception raised when the scraper does not find a parser associated to a URL. |
Classes#
- class minimal_web_scraper.parsers.BaseParser#
Base class for creating custom parsers.
Subclasses must override
parse()andscope_urls. Parsers are not intended to be instantiated.Overview
Attributes# Define which URLs the parser is intended to parse.
Members
- scope_urls: list[str]#
Define which URLs the parser is intended to parse.
- abstract classmethod parse(html_content: bytes, encoding: str | None) Any#
Abstract method to parse HTML chunks.
- Parameters:
html_content – the raw HTML to parse
encoding – the associated encoding of the HTML
- Returns:
return the extracted elements
Functions#
- minimal_web_scraper.parsers.add_parser(parser: Type[BaseParser] | None = None) None#
Add the given parser in the list that the scraper checks for parsing data.
- Parameters:
parser – the parser must be a subclass of BaseParser. If no parser is provided, it will add all parsers imported (default None).
- Raises:
TypeError – raised when the argument parser is not a subclass of
BaseParser
Exceptions#
- exception minimal_web_scraper.parsers.ParserNotFound#
Bases:
BaseParserExceptionException raised when the scraper does not find a parser associated to a URL.