Create a parser#
The most common way is to create a file dedicated to the parser or parsers.
Here is the skeleton of a parser compatible with the library:
from minimal_web_scraper import BaseParser
class ExampleParser(BaseParser):
scope_urls = []
@classmethod
def parse(cls, html_content, encoding):
return []
The parser must inherit from the
BaseParserclass from the library.It must contain
scope_urlsvariable and aparsemethod.parsemethod must be a@classmethodand return a list of dictionary elements.
The recommended tools to write the parse method are the BeautifulSoup and re libraries.
But any library parsing HTML formatted string could be used.
To see working parsers, check out the Github repository.