scrachy.content

Classes and utilities for extracting textual content from the HTML body.

Classes

BaseContentExtractor(settings)

A content extractor base class that keeps track of the project middleware.

ContentExtractor(*args, **kwargs)

Modules

scrachy.content.boilerpipe

Content extraction using BoilerPy3.

scrachy.content.bs4

Content extraction using Beautiful Soup.