minimal_web_scraper.parsers.base#

Overview#

Classes#

BaseParser

Base class for creating custom parsers.

Classes#

class minimal_web_scraper.parsers.base.BaseParser#

Base class for creating custom parsers.

Subclasses must override parse() and scope_urls. Parsers are not intended to be instantiated.

Overview

Attributes#

scope_urls

Define which URLs the parser is intended to parse.

Methods#

parse(html_content, encoding)

class Abstract method to parse HTML chunks.

Members

scope_urls: list[str]#

Define which URLs the parser is intended to parse.

abstract classmethod parse(html_content: bytes, encoding: str | None) Any#

Abstract method to parse HTML chunks.

Parameters:
  • html_content – the raw HTML to parse

  • encoding – the associated encoding of the HTML

Returns:

return the extracted elements