scrachy.http_.CachedHtmlResponse

class scrachy.http_.CachedHtmlResponse(*args: Any, **kwargs: Any)[source]

Bases: CachedResponseMixin, HtmlResponse

A subclass of scrapy.http.HttpResponse that contains a subset of the extra information stored in the cache.

Parameters:
  • scrape_timestamp – The most recent date the request was scraped.

  • body_number_of_bytes – The total number of bytes of the downloaded html.

  • text_number_of_bytes – The number of bytes in the extracted plain text.

  • body_text – The text extracted from the HTML.

__init__(*args, **kwargs)[source]

A subclass of scrapy.http.HttpResponse that contains a subset of the extra information stored in the cache.

Parameters:
  • scrape_timestamp – The most recent date the request was scraped.

  • body_number_of_bytes – The total number of bytes of the downloaded html.

  • text_number_of_bytes – The number of bytes in the extracted plain text.

  • body_text – The text extracted from the HTML.

Methods

__init__(*args, **kwargs)

A subclass of scrapy.http.HttpResponse that contains a subset of the extra information stored in the cache.

copy()

Return a copy of this Response

css(query)

Shortcut method implemented only by responses whose content is text (subclasses of TextResponse).

follow(url[, callback, method, headers, ...])

Return a Request instance to follow a link url.

follow_all([urls, callback, method, ...])

A generator that produces Request instances to follow all links in urls.

jmespath(query, **kwargs)

Shortcut method implemented only by responses whose content is text (subclasses of TextResponse).

json()

New in version 2.2.

replace(*args, **kwargs)

Create a new Response with the same attributes except for those given new values

urljoin(url)

Join this Response's url with a possible relative url to form an absolute interpretation of the latter.

xpath(query, **kwargs)

Shortcut method implemented only by responses whose content is text (subclasses of TextResponse).

Attributes

attributes

A tuple of str objects containing the name of all public attributes of the class that are also keyword parameters of the __init__ method.

body

cb_kwargs

encoding

meta

selector

text

Body as unicode

url

attributes: Tuple[str, ...] = ('url', 'status', 'headers', 'body', 'flags', 'request', 'certificate', 'ip_address', 'protocol', 'encoding')

A tuple of str objects containing the name of all public attributes of the class that are also keyword parameters of the __init__ method.

Currently used by Response.replace().

copy()

Return a copy of this Response

css(query)

Shortcut method implemented only by responses whose content is text (subclasses of TextResponse).

follow(url, callback=None, method='GET', headers=None, body=None, cookies=None, meta=None, encoding=None, priority=0, dont_filter=False, errback=None, cb_kwargs=None, flags=None) Request

Return a Request instance to follow a link url. It accepts the same arguments as Request.__init__ method, but url can be not only an absolute URL, but also

  • a relative URL

  • a Link object, e.g. the result of Link Extractors

  • a Selector object for a <link> or <a> element, e.g. response.css('a.my_link')[0]

  • an attribute Selector (not SelectorList), e.g. response.css('a::attr(href)')[0] or response.xpath('//img/@src')[0]

See A shortcut for creating Requests for usage examples.

follow_all(urls=None, callback=None, method='GET', headers=None, body=None, cookies=None, meta=None, encoding=None, priority=0, dont_filter=False, errback=None, cb_kwargs=None, flags=None, css=None, xpath=None) Generator[Request, None, None]

A generator that produces Request instances to follow all links in urls. It accepts the same arguments as the Request’s __init__ method, except that each urls element does not need to be an absolute URL, it can be any of the following:

  • a relative URL

  • a Link object, e.g. the result of Link Extractors

  • a Selector object for a <link> or <a> element, e.g. response.css('a.my_link')[0]

  • an attribute Selector (not SelectorList), e.g. response.css('a::attr(href)')[0] or response.xpath('//img/@src')[0]

In addition, css and xpath arguments are accepted to perform the link extraction within the follow_all method (only one of urls, css and xpath is accepted).

Note that when passing a SelectorList as argument for the urls parameter or using the css or xpath parameters, this method will not produce requests for selectors from which links cannot be obtained (for instance, anchor tags without an href attribute)

jmespath(query, **kwargs)

Shortcut method implemented only by responses whose content is text (subclasses of TextResponse).

json()

New in version 2.2.

Deserialize a JSON document to a Python object.

replace(*args, **kwargs)

Create a new Response with the same attributes except for those given new values

property text: str

Body as unicode

urljoin(url)

Join this Response’s url with a possible relative url to form an absolute interpretation of the latter.

xpath(query, **kwargs)

Shortcut method implemented only by responses whose content is text (subclasses of TextResponse).