scrachy.content.ContentExtractor

class scrachy.content.ContentExtractor(*args, **kwargs)[source]

Bases: Protocol

__init__(*args, **kwargs)

Methods

__init__(*args, **kwargs)

get_content(html)

Get the desired textual content from the HTML.

get_content(html: str) str[source]

Get the desired textual content from the HTML.

Parameters:

html – The textual HTML to process.

Returns:

The desired content (e.g., text with tags removed).