scrachy.content.BaseContentExtractor
- class scrachy.content.BaseContentExtractor(settings: Settings)[source]
Bases:
ContentExtractor
A content extractor base class that keeps track of the project middleware.
- Parameters:
settings – The Scrapy
Settings
.
- __init__(settings: Settings)[source]
A content extractor base class that keeps track of the project middleware.
- Parameters:
settings – The Scrapy
Settings
.
Methods
__init__
(settings)A content extractor base class that keeps track of the project middleware.
get_content
(html)Get the desired textual content from the HTML.
- get_content(html: str) str
Get the desired textual content from the HTML.
- Parameters:
html – The textual HTML to process.
- Returns:
The desired content (e.g., text with tags removed).