scrachy.http_.CachedResponseMixin

class scrachy.http_.CachedResponseMixin(scrape_timestamp: datetime | None = None, extracted_text: str | None = None, body_length: int | None = None, extracted_text_length: int | None = None, scrape_history: list[ScrapeHistory] | None = None, *args, **kwargs)[source]

Bases: object

A subclass of scrapy.http.HttpResponse that contains a subset of the extra information stored in the cache.

Parameters:
  • scrape_timestamp – The most recent date the request was scraped.

  • body_number_of_bytes – The total number of bytes of the downloaded html.

  • text_number_of_bytes – The number of bytes in the extracted plain text.

  • body_text – The text extracted from the HTML.

__init__(scrape_timestamp: datetime | None = None, extracted_text: str | None = None, body_length: int | None = None, extracted_text_length: int | None = None, scrape_history: list[ScrapeHistory] | None = None, *args, **kwargs)[source]

A subclass of scrapy.http.HttpResponse that contains a subset of the extra information stored in the cache.

Parameters:
  • scrape_timestamp – The most recent date the request was scraped.

  • body_number_of_bytes – The total number of bytes of the downloaded html.

  • text_number_of_bytes – The number of bytes in the extracted plain text.

  • body_text – The text extracted from the HTML.

Methods

__init__([scrape_timestamp, extracted_text, ...])

A subclass of scrapy.http.HttpResponse that contains a subset of the extra information stored in the cache.