minimal-web-scraper 0.1.1 documentation#

minimal-web-scraper is a Python library that provides you the tools to scrape data from the web. The library supports Python >=3.10,<4.0.

Note

Please be aware of the legal implication of scraping information from internet: https://en.wikipedia.org/wiki/Web_scraping#Legal_issues

Note

The library is not published on Pypi, this is why we use the github repository URL and you must have git installed. See VCS support on pip documentation.

Dependencies#

Those are the libraries that minimal-web-scraper uses to work:

requests (>=2.20,<3.0)

Quick Start#

Here is a small script that demonstrate how to use the library. Notice that the library doesn’t not provide any parser. See the example at the Github repository for a working scraper.

# example.py

import pandas as pd

import minimal_web_scraper as scraper
from minimal_web_scraper import parsers

# import the parsers modules to add them to the scraper list of parsers
import parser_example

# add all parsers imported (which are subclass of BaseParser)
parsers.add_parser()

# or add them manually
# parsers.add_parser(parser_example.BookParser)
# parsers.add_parser(parser_example.BooksParser)

# scrape the URL in argument and return a dictionary of parsed data
data = scraper.scrape("https://books.toscrape.com/")

# Pretty output formatting with pandas
books = pd.DataFrame(data=data, columns=["name", "price"])
print(books.head(5))

Note

This project is in development state, and the author doesn’t guarantee the stability of its API.

This is a really small library, if you need a comprehensive and proven web scraper in your favorite language, check out Scrapy framework.

Get started#

Check out Get started for a step-by-step instruction to set up a project using minimal-web-scraper.

Get started
- Installation
- minimal-web-scraper

How-to guides#

Check out How-to guides for specific guides on the library.

How-to guides
- Create a parser
- Import a parser

minimal-web-scraper 0.1.1 documentation#

Dependencies#

Quick Start#

Get started#

How-to guides#

References#

More#